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Abstract 

The  spectrotemporal  receptive  field  (STRF)  provides  a  versatile  and  integrated 
(spectral  and  temporal)  functional  characterization  of  single  cells  in  primary  auditory 
cortex  (AI).  We  explore  in  this  paper  the  origin  and  relationship  between  several  dif¬ 
ferent  ways  of  measuring  and  analyzing  the  STRF.  Specifically,  we  demonstrate  that 
STRFs  measured  using  a  spectrotemporally  diverse  array  of  broadband  stimuli  —  such 
as  dynamic  ripples,  spectrotemporally  white  noise  (STWN),  and  temporally  orthogo¬ 
nal  ripple  combinations  (TORCs)  —  are  very  similar,  confirming  earlier  findings  that 
the  STRF  is  a  robust  linear  descriptor  of  the  cell.  We  also  present  a  new  deterministic 
analysis  framework  that  employs  the  Fourier  series  to  describe  the  spectrotemporal 
modulation  frequency  content  of  the  stimuli  and  responses.  Additional  insights  into 
the  STRF  measurements,  including  the  nature  and  interpretation  of  measurement  er¬ 
rors,  is  presented  using  the  Fourier  transform,  coupled  to  singular-value  decomposition 
(SVD),  and  variability  analyses  including  bootstrap.  The  results  promote  the  utility 
of  the  STRF  as  a  core  functional  descriptor  of  neurons  in  AI. 

Key  Words:  spectrotemporal  receptive  field,  modulation  transfer  function,  auditory 
cortex,  linearity,  stimulus  invariance,  ripple,  variability,  singular-value  decomposition,  ferret 
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1  Introduction 

It  has  been  over  twenty  years  since  the  spectrotemporal  receptive  field,  (STRF)  was  conceived 
to  overcome  limitations,  both  empirical  and  theoretical,  associated  with  traditional  func¬ 
tional  measurements  of  auditory  neurons  (Hermes  et  ah,  1981;  Aertsen  and  Johannesma, 
1981b).  The  traditional  approach  was  associated  with  simple,  easily  manipulable  stimuli, 
such  as  clicks  and  tones,  and  relied  on  the  experimenter  to  impose  certain  conditions  and 
then  observe  their  effects.  In  contrast,  the  STRF  was  associated  with  more  diverse  stimuli, 
characterized  by  randomly  varying  conditions,  and  an  approach  labeled  reverse  correlation , 
by  which  the  neuron  informs  the  experimenter,  via  action  potentials,  of  the  conditions  that 
were  of  interest  to  it  (de  Boer  and  de  Jongh,  1978;  Eggermont  et  ah,  1983b).  Addition¬ 
ally,  while  previous  measurements  typically  probed  neurons’  sensitivities  to  the  spectral  or 
temporal  dimensions  of  sound  separately,  the  STRF  was  based  on  the  more  general  thesis 
that  the  interdependence  of  these  dimensions  forms  an  irreducible  Gestalt  part  of  a  neu¬ 
ron’s  sensitivity  (Smolders  et  al.,  1979;  Eggermont  et  ah,  1981;  Johannesma  and  Eggermont, 
1983).  Furthermore,  the  STRF  neatly  fit  within  a  rigorous  analytical  framework,  bolstered 
by  the  fields  of  time-frequency  analysis  (Cohen,  1995)  and  nonlinear  systems  theory  (Egger¬ 
mont,  1993),  within  which  the  functionality  of  neurons  could,  in  principle,  be  systematically 
explored  to  any  level  of  detail. 

The  STRF  describes  the  linear  relationship  between  the  time-dependent  spike  rate  of  a 
neuron  and  the  time-  and  frequency-dependent  energy  —  in  short,  the  dynamic  spectrum  — 
of  a  stimulus.  In  order  to  measure  the  STRF,  the  reverse-correlation  approach  prescribes 
computing  the  average  dynamic  spectrum  of  those  portions  of  a  stimulus  preceding  the 
neuron’s  spikes.  In  this  context,  the  STRF  is  commonly  interpreted  as  the  spectrotemporal 
pattern  that  optimally  activates  a  neuron  (Young,  1998).  Theoretically,  as  long  as  all  patterns 
occur  randomly,  independently,  and  equiprobably,  the  STRF  can  be  revealed  by  this  “spike- 
triggered  average”  (Eggermont,  1993). 

Although  the  STRF  has  been  slow  to  mature,  it  is  now  increasingly  used  to  study  the 
physiology  of  central  auditory  neurons.  In  retrospect,  the  often  slow  pace  of  progress  can 
be  partially  attributed  to  the  reverse-correlation  methodology,  which  remains  fairly  opaque. 
In  particular,  reverse  correlation  provides  no  straightforward  formal  basis  for  describing  the 
effectiveness  of,  or  relations  between,  specific  stimuli,  because  only  the  average  statistics  of 
stimuli  are  specified.  For  example,  Gaussian  broad-band  noise,  an  ideal  stimulus  from  the 
reverse-correlation  standpoint,  is  often  ineffective  when  applied  to  central  auditory  neurons 
(but  see  (Keller  and  Takahashi,  2000)).  Meanwhile,  a  range  of  other  stimuli  and  associated 
techniques  have  been  auditioned,  many  of  which  have  fared  better,  including  randomly  mod¬ 
ulated  broad-band  noise  (Miller  et  al.,  2002;  Escabf  and  Schreiner,  2002),  random  sequences 
of  tones  or  chords  (Aertsen  and  Johannesma,  1981a;  Epping  and  Eggermont,  1985;  Schafer 
et  al.,  1992;  deCharms  et  al.,  1998;  Theunissen  et  al.,  2000;  Rutkowski  et  al.,  2002),  and 
natural  stimuli  (Aertsen  and  Johannesma,  1981a;  Yeshurun  et  al.,  1987;  Schafer  et  al.,  1992; 
Theunissen  et  al.,  2000;  Sen  et  al.,  2001).  While  it  is  sometimes  implied  that  the  auditory 
system  processes  different  stimuli  differently,  it  has  not  been  made  clear,  because  of  the  lack 
of  vocabulary,  to  what  extent  different  stimulation  methods  should  yield  different  results. 
Additionally,  most  of  the  employed  stimuli  share  randomness  in  their  spectrotemporal  de¬ 
sign,  in  accordance  with  the  reverse-correlation  approach,  but  this  style  of  stimulation  is 
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bound  to  be  inefficient  (Victor  and  Knight,  1979;  Sutter,  1992). 

Because  of  these  shortcomings,  we  endeavored  to  record  a  deterministic  and  analytical 
reformulation  of  spectrotemporal  reverse  correlation  (Klein  et  al.,  2000).  The  roots  of  this 
new  methodology  are  in  the  Fourier-based  analysis  (Papoulis,  1962)  of  any  given  stimulus 
in  terms  of  its  spectrotemporal  modulation  frequency  content.  Each  spectrotemporal  mod¬ 
ulation  frequency  is  the  conjunction  of  a  spectral  and  a  temporal  modulation  frequency; 
the  higher  the  spectral  modulation  frequency,  the  sharper  the  spectral  feature  (e.g.,  sharp 
peaks  or  edges  in  the  spectrum),  and  the  higher  the  temporal  modulation  frequency,  the 
more  abruptly  that  feature  changes  in  time.  As  a  population,  central  auditory  neurons  only 
respond  linearly  to  a  select  range  of  low  spectral  and  temporal  modulation  frequencies  (Rees 
and  Moller,  1983;  Shamma  et  al.,  1995;  Schreiner  and  Calhoun,  1995;  Kowalski  et  al.,  1996a; 
Depireux  et  al.,  2001;  Sen  et  al.,  2001;  Miller  et  al.,  2002;  Escabi  and  Schreiner,  2002).  Not 
surprisingly,  the  most  fruitful  stimuli  have  had  their  spectrotemporal  modulation  frequency 
content  either  explicitly  or  implicitly  concentrated  within  this  relevant  range.  Our  approach 
extends  these  past  successes  by  making  explicit  the  relations  between  the  spectrotemporal 
modulation  frequency  content  of  a  stimulus,  the  stimulus  duration  and  bandwidth,  and  the 
accuracy  of  the  STRF  measurement.  This  enables  the  flexible  design  of  diverse  stimuli  that 
minimize  both  stimulation  time  and  measurement  error,  within  the  constraints  of  a  partic¬ 
ular  experiment.  These  constraints  include  information  about  not  only  the  STRF,  but  also 
about  the  nonlinear  and  stochastic  aspects  of  the  stimulus-response  transformation,  which 
are  not  directly  described  by  the  STRF.  Another  important  advantage  of  this  methodology  is 
that  it  can  be  used  to  describe  the  mechanics  of  STRF  measurement  with  any  given  stimulus , 
thus  providing  a  language  with  which  apparently  disparate  methods  can  be  discussed. 

We  focus  in  this  article  on  three  specific  types  of  stimuli  with  increasing  level  of  complex¬ 
ity,  applied  in  primary  auditory  cortex  (Al)  of  the  ferret.  On  one  side  of  the  spectrum  are  the 
dynamic  ripple  stimuli  (Kowalski  et  al.,  1996a, b;  Depireux  et  al.,  2001),  which  each  consist 
of  a  single  spectrotemporal  modulation  frequency.  At  the  other  extreme  is  spectrotemporally 
white  noise  (STWN),  which  contains  many  superimposed  spectrotemporal  modulation  fre¬ 
quencies.  Intermediate  are  temporally  orthogonal  ripple  combinations  (TORCs),  consisting 
of  special  combinations  of  several  spectrotemporal  modulation  frequencies  each.  We  shall 
explore  the  relations  between  these  stimuli,  and  compare  the  responses  they  evoke  and  the 
resulting  STRF  measurements.  Among  the  issues  addressed  are  the  similarity  between  the 
STRF  measurements,  their  fidelity  and  noise-robustness,  their  susceptibility  to  common  neu¬ 
ronal  nonlinear ities,  and  the  expected  amount  of  data  necessary  to  achieve  an  measurement 
with  a  desired  level  of  accuracy. 

2  Methods 

2.1  Theory 

In  this  section,  we  outline  the  methodological  basis  of  this  study.  Its  key  element  is  an 
analytical  description  of  the  STRF-based  stimulus-to-response  transformation,  in  terms  of 
the  processing  of  spectrotemporal  modulation  frequencies.  In  this  context,  the  result  of 
reverse  correlation  is  derived,  first  assuming  that  the  response  is  deterministically  and  linearly 
related  to  the  stimulus,  and  then  considering  the  separate  effects  of  response  variability  and 
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nonlinearity. 

At  the  core  of  the  STRF-based  model  of  neural  functionality  is  the  following  equation: 


r(t) 


ST RF(t,  x)  •  S(t  —  r,  x)  dr  dx, 


(1) 


where  the  neuronal  response  r  at  any  time  t  is  the  linear  integration  of  influences  arising 
from  stimulus  energy  S  at  different  tonotopic  locations  x  (here  corresponding  to  the  log¬ 
arithm  of  frequency)  and  different  times  in  the  past  r.  The  strength  and  nature  of  the 
influences  —  whether  they  are  excitatory  (positive)  or  suppressive  (negative)  —  is  described 
by  STRF(t,x).  In  the  context  of  reverse  correlation,  r(t)  is  typically  taken  to  be  the  time- 
dependent  spike  rate  of  a  neuron  (Eggermont  et  ah,  1983a;  Keller  and  Takahashi,  2000;  Sen 
et  ah,  2001). 


2.1.1  The  Linear  Processing  of  Spectrotemporal  Modulation  Frequencies 

Our  analytical  description  of  dynamic  spectra  is  based  upon  the  Fourier  series  (Papoulis, 
1962),  using  elemental  Fourier  components,  illustrated  in  Figure  1A,  which  are  cosine  waves 
as  a  function  of  both  t  and  x:  a-cos(2'Kwt  +  27rflx  +  ip).  The  wave  has  a  peak  value  of  a  and  a 
starting  phase  of  if.  The  wave  frequency  is  w  cycles/second  (Hz)  along  t  and  Q  cycles/octave 
(cyc/oct)  along  x.  Since  the  dynamic  spectrum  details  the  modulation  of  acoustic  energy 
as  a  function  of  both  x  and  t,  these  frequencies  are  referred  to  as  modulation  frequencies: 
spectral  (0)  and  temporal  (w).  A  single  Fourier  component  is  said  to  consist  of  a  single 
spectrotemporal  modulation  frequency,  defined  by  a  specific  (w,  Q)  pair.  Just  as  a  sum  of  pure 
tones  of  various  frequencies,  amplitudes,  and  phases  can  describe  any  acoustic  waveform  over 
a  finite  duration,  a  sum  of  various  spectrotemporal  modulation  frequencies  (with  appropriate 
amplitudes  and  phases)  can  describe  any  dynamic  spectrum  over  a  finite  duration  T  and 
bandwidth  X.  Further,  just  as  the  frequency  content  of  an  acoustic  waveform  (i.e. ,  the 
amplitudes  and  phases  of  its  constituent  tones)  is  described  by  its  (Fourier)  spectrum,  the 
spectrotemporal  modulation  frequency  content  of  a  dynamic  spectrum  is  described  by  its 
spectrotemporal  modulation  spectrum  (MSst)- 

When  the  STRF  is  recast  as  operating  upon  the  MSst,  one  arrives  at  a  complementary 
description  called  the  spectrotemporal  modulation  transfer  function  (MTFst).  The  MTFst, 
which  is  given  by  the  Fourier-series  description  of  the  STRF,  details  the  linear  neural  process¬ 
ing  of  spectrotemporal  modulation  frequencies.  Such  processing  is  already  under  study  in 
auditory  neurophysiology  (Kowalski  et  ah,  1996a, b;  Depireux  et  al.,  2001;  Miller  et  ah,  2001, 
2002;  Escabi  and  Schreiner,  2002)  and  psychoacoustics  (Chi  et  al.,  1999),  and  is  also  being 
investigated  for  various  signal-processing  tasks,  including  audio  coding  (Atlas  and  Shamma, 
2003;  Klein  et  al.,  2003)  and  speech  recognition  (Hermmansky,  1999;  Nadeu  et  al.,  2001; 
Kleinschmidt  and  Gelbart,  2002;  Kleinschmidt,  2002). 

The  MSst  and  MTFst  are  mathematically  defined  as  follows.  Consider  a  dynamic  spec¬ 
trum  S  and  an  STRF,  both  given  over  a  finite  range  of  T  seconds  and  X  octaves.  Using  the 
exponential  form  of  the  Fourier  series,  S  can  be  expressed  by  the  following  sum, 


oo  oo 

S(t,x)=  E  («K ,a]e^KA]) 


ej27r(wkt+Qix) 


k=  —  OG  l  =  —  oo 


(2) 
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Figure  1:  Fourier-based  description  of  the  STRF.  A:  Examples  of  the  basic  components  used  by  the  Fourier 
Series  to  describe  arbitrary  dynamic-spectra  and  STRFs.  They  are  cosine  waves  as  functions  of  t  (time) 
and  x  (log- frequency),  with  ‘temporal  modulation  frequency’  (along  t)  of  w  cycles/second  (Hz)  and  ‘spectral 
modulation  frequency’  (along  x)  of  Cl  cycles/octave  (cyc/oct).  If  w  and  Cl  have  the  same  sign,  the  spectral 
profile  drifts  toward  lower  x’s  over  time;  otherwise,  it  drifts  upward.  B:  An  STRF[r,x],  derived  from  an 
actual  measurement,  detailing  how  stimulus  energy  at  various  x’s  linearly  influences  the  response  at  various 
latencies  r.  Positive  influence  (activation)  is  represented  by  hotter  colors  (e.g.,  red),  and  negative  influence 
(suppression)  by  cooler  colors  (e.g.,  blue).  C:  The  magnitude  of  MTFst[w,  fi],  obtained  from  the  Discrete 
Fourier  Transform  (DFT)  of  the  STRF  in  B  (over  T  =  250  ms  and  X  =  5  oct),  indicating  the  amplitudes  of 
the  Fourier  components  required  to  describe  the  STRF.  The  phases  are  not  shown.  The  points  corresponding 
to  the  components  from  A  are  outlined  by  black  boxes.  Quadrants  3  and  4  are  the  complex  conjugates  of 
Quadrants  1  and  2,  and  are  not  shown  in  subsequent  figures.  D:  Dynamic  spectrum  S[t,x]  of  a  250  ms 
segment  of  a  ferret  vocalization,  computed  using  a  cochlea-like  filter  bank  (Yang  et  ah,  1992).  Overlaid  is 
a  contour  of  the  STRF[—r,x\  (from  B)  highlighting  its  interpretation  as  the  spectrotemporal  pattern  that 
maximally  activates  the  neuron.  E:  The  magnitude  of  MSst[w,  D],  indicating  the  amplitudes  of  the  Fourier 
components  required  to  describe  S[t,x]  from  D.  Overlaid  is  a  contour  of  MTFst[w,  — fi],  highlighting  the 
modulation  frequencies  important  for  determining  the  response  via  Eq.  (3).  F:  The  response  r[t],  produced 
by  temporally  convolving  the  STRF  with  5,  and  summing  over  x.  The  overlaid  STRF  depicts  this  operation 
at  an  instant  £,  when  the  neuron  is  maximally  activated.  G:  The  Discrete  Fourier  Transform  of  the  response 
r[w\  (only  magnitudes  are  shown),  produced  by  multiplying  MTFst[w ,  —Cl]  with  MSst[w,  ^],  and  summing 
over  D,  as  per  Eq.  (3). 
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where  e  is  the  base  of  the  natural  logarithm,  j  =  \/— 1,  k  and  l  are  integers,  Wk  =  k/T ,  and 
0;  =  l/X.  This  is  perhaps  the  simplest  form  of  the  Fourier  series  to  use,  although  it  employs 
the  badly  named  “complex”  exponential  functions.  These  functions  are  related  to  the  real¬ 
valued  Fourier  components  through  the  trigonometric  identity  cos(</>)  =  \  (ej4‘  +  e--7^),  etc. 
Accordingly,  each  term  in  this  sum,  indexed  by  k  and  l,  has  a  complex-conjugate  counterpart, 
indexed  by  —k  and  —l,  such  that  a[wk,^i]  =  a[w-k,  and  0/  =  —  ^[w-k, 

Hereforth  we  will  simplify  the  notation  by  dropping  the  k  and  l  subscripts,  however  keeping 
in  mind  that  w  and  0  are  discrete- valued  variables  (as  indicated  by  the  square  brackets). 
Thus,  the  amplitudes  and  phases  of  the  ripple  components  are  given  by  a[w,  0]  and  ip[w,Q], 
which  together  form  the  MSst,  MSst[w,SI]  =  a[w,Sl jeA’b'T].  As  for  the  STRF,  its  Fourier 
series  description  employs  the  same  ripple  components,  but  with  different  amplitudes  b[w,  0] 
and  phases  6[w,  0],  which  together  form  the  MTFst,  MTFst[u> ,  D]  =  b[w, 

In  practice,  S(t,  x )  is  represented  on  a  computer  by  discrete  samples,  S[tk,  x{\  =  S(kAt,  IAX), 
taken  at  a  rate  of  1/At  samples/second  and  1/AX  samples/octave,  where  k  and  l  are  inte¬ 
gers.  Again,  we  will  drop  the  k  and  l  subscripts,  however  keeping  in  mind  that  t  and  x  are 
now  discrete- valued  variables.  By  the  sampling  theorem  (Oppenheim  and  Schafer,  1989), 
this  assumes  that  S  is  sufficiently  smooth;  that  is,  it  can  be  described  by  a  limited  num¬ 
ber  of  temporal  and  spectral  modulation  frequencies  no  higher  than  1/(2 At)  and  1/(2 A„:), 
respectively.  Within  these  limits,  MSst [w,f2]  is  then  obtained  by  computing  the  Discrete 
Fourier  Transform  (DFT)  of  S[t,x\  (using  the  Fast  Fourier  Transform,  or  FFT,  algorithm) 
(Oppenheim  and  Schafer,  1989).  Analogously,  MTFst  [w  ■  0]  is  obtainable  via  the  (Discrete) 
Fourier  Transform  of  STRF[t,x].  An  example  STRF  and  corresponding  MTFst  magnitude 
is  shown  in  Figures  IB  and  C,  and  an  example  dynamic  spectrum  and  corresponding  MSst 
magnitude  is  shown  in  Figures  ID  and  E. 

Since  the  response,  r(t),  depends  only  on  time,  its  Fourier-series  description  utilizes 
only  temporal  modulation  frequencies.  It  can  be  derived  by  inserting  the  Fourier-series 
descriptions  of  S  and  STRF  into  Eq.  (1)  and  carrying  out  the  integration.  The  result  is 
such  that  the  Fourier  Transform  of  r[t]  (the  sampled  response),  will  have  the  form 

f[w]  =  MTFst[w,  -fi]  •  MSst[w,  fi]  =  Y  mtfst[w,  fi]  •  MSst[w,  — D]  (3) 

n  a 

Recall  that  in  Eq.  (1)  the  response  was  obtained  by  integrating  over  the  spectral  axis  (x) 
after  temporally  convolving  the  dynamic  spectrum  with  the  STRF  (illustrated  in  Figures 
ID  and  F);  here,  the  convolution  is  realized  via  the  multiplication  of  Fourier  Transforms1 
(Oppenheim  and  Schafer,  1989),  and  the  integration  over  x  is  replaced  by  a  summation  over 
Q  (illustrated  in  Figures  IE  and  G).  Therefore,  each  frequency  w  in  the  response  results 
from  all  spectrotemporal  modulation  frequencies  in  the  stimulus  sharing  the  same  w. 

2.1.2  Fourier-based  Reformulation  of  Spectrotemporal  Reverse  Correlation 

The  STRF  was  in  Section  2.1.1  recast  in  terms  of  the  processing  of  spectrotemporal  modu¬ 
lation  frequencies.  The  result  of  spectrotemporal  reverse  correlation  will  now  be  derived  in 
this  context. 

1  Strictly  speaking,  this  implements  a  circular  convolution.  If  the  stimulus  is  not  periodic,  this  can  be 
converted  to  a  linear  convolution  by  including  zeros  (silence)  before  and  after  the  stimulus  (Oppenheim  and 
Schafer,  1989). 
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If  spike  times  are  quantized,  and  stimuli  are  sampled,  with  a  temporal  resolution  At, 
then  the  average  stimulus  preceding  a  neuron’s  spikes  is  proportional  to  the  temporal  cross¬ 
correlation  between  the  stimulus  and  a  response,  y[t],  consisting  of  the  number  of  spikes 
observed  in  consecutive  At  intervals  (Eggermont  et  al.,  1983b).  For  now,  we  assume  that 
y[t\/ At,  with  units  of  spike  rate  (spikes/second),  is  equal  to  r[t]  (the  sampled  STRF-based 
response),  whose  Fourier  Transform  r[w]  was  derived  in  Eq.  (3).  Cross-correlation  is  a  linear 
operation  and,  much  like  convolution,  it  can  be  realized  via  the  multiplication  of  Fourier 
Transforms2  (Oppenheim  and  Schafer,  1989).  This  takes  the  following  form,  in  the  case  of 
spectrotemporal  reverse  correlation: 


r[w]  •  MS*st[w,  -fi]  =  MTFst[w,  0]  •  \MSst[w,  -0]  |2 

+  ^  MTFST[w,n']MSsT[w,-n'}MS*ST[wr~n\ 

=  MTFST[w,n]ia[w,-^])2  +  e[w,fi],  (4) 


where  *  denotes  complex  conjugation  and  \MSst[w,^1]\  =  y/MSsriwi  "  MSgT[w,  Id]  = 
a  [w ,  Q]  is  the  magnitude  of  MSst-  Eq.  (4)  represents  the  Fourier  Transform  of  the  reverse 
correlation  result. 

An  important  special  case  exists  when  \MSst\  is  flat  ( a [w ,  fi]  =  a)  over  the  extent  of 
MTFst  that  is  nonzero,  and  further  e[w,  Id]  =  0.  Then,  Eq.  (4)  is  proportional  to  the 
MTFst,  with 


MTFst[w ,  O]  = 


r\w\ 


MS*st[w,  -fi] 


(5) 


Since  STRF[t,x\  is,  by  definition,  the  inverse  Fourier  Transform  of  MTFst [w;,f2],  this  im¬ 
plies  that,  in  this  special  case,  reverse  correlation  will  yield  a  result  proportional  to  the 


STRF. 


The  flat  \MSst |  requirement  is  equivalently  a  requirement  that  the  stimulus  contain  in 
equal  strength  all  spectrotemporal  modulation  frequencies  needed  to  construct  the  MTFst- 
If  the  stimulus  contains  a  subset  of  the  necessary  modulation  frequencies,  then  only  part 
of  the  MTFst  can  be  constructed;  the  MTFst  will  be  filtered.  The  e  =  0  requirement  is 
not  so  simply  stated.  This  is  a  systematic  stimulus-induced,  error  dependent  upon  temporal 
correlations  between  different  spectrotemporal  modulation  frequencies  in  the  stimulus  (it 
may  also  be  framed  in  terms  of  temporal  correlations  between  the  stimulus  energy  at  different 
tonotopic  locations)  (Klein  et  ah,  2000;  Theunissen  et  ah,  2000).  It  will  be  nonzero  if  the 
stimulus  contains  multiple  spectrotemporal  modulation  frequencies  that  share  the  same  value 
of  | re  |,  and  therefore  by  Eq.  (3)  evoke  the  same  frequency  in  the  response.  For  a  general 
stimulus,  e  will  not  be  zero,  or  even  small,  and  therefore  one  of  three  methods  must  be 
used  to  eliminate  or  reduce  its  effects:  First,  if  stimuli  are  sufficiently  diverse  over  time 
or  over  multiple  stimuli,  then  e  asymptotically  approaches  zero  as  the  stimulus  duration 
or  the  number  of  stimuli  increases  (Klein  et  al.,  2000);  second,  specially  designed  stimuli 
may  be  employed  for  which  e  is  zero  (Kvale  et  al.,  1998;  Klein  et  al.,  2000);  and  third, 
additional  computations  may  be  undertaken  to  try  and  adjust  for  the  correlations  in  the 
stimulus  (Aertsen  et  al.,  1980;  Aertsen  and  Johannesma,  1981a;  Theunissen  et  al.,  2000).  In 
this  article,  we  concentrate  on  the  first  two  of  these  methods. 


2  Modulo  the  previous  note  concerning  circular  convolution 
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Thus  far,  we  have  assumed  that  the  response  is  deterministically  and  linearly  related  to 
the  dynamic  spectrum.  In  the  next  two  sections,  we  relax  these  assumptions  and  consider 
how  response  variability  and  nonlinearity  effects  the  real-world  results.  Accordingly,  Eq.  (5) 
is  henceforth  treated  as  a  measurement  of  the  MTFst  (and  subsequently  the  STRF),  using 
an  observed  response  that  is  not  necessarily  fully  described  by  the  STRF. 

2.1.3  Reliability  of  the  STRF  Measurement 

We  have  assumed  thus  far  that  the  transformation  from  stimulus  to  response  is  deterministic. 
However,  in  response  to  identical  stimulus  presentations,  neuronal  responses  exhibit  inherent 
variability  (Shadlen  and  Newsome,  1998),  and  so  the  result  of  reverse  correlation  is  somewhat 
indeterminate.  Therefore,  Eq.  (4)  should  be  interpreted  as  the  mean  result,  which  would  be 
obtained  by  averaging  the  results  of  an  infinite  number  of  identical  experiments.  Due  to  the 
linearity  of  reverse  correlation,  this  is  also  the  result  obtained  if  r[t]  is  taken  to  be  the  mean 
of  y[t]/ At  (the  mean  time-dependent  spike  rate). 

This  mean  result  is  called  the  signal.  The  difference  between  the  actual  measurement 
and  its  mean  is  called  noise.  The  exact  form  of  the  noise  varies  from  measurement  to 
measurement.  The  mean  squared-magnitude  of  the  noise,  as  a  function  of  t  and  x,  is  called 
the  variance  of  the  measurement  (the  square  of  the  standard  error).  The  overall  reliability 
of  the  measurement  can  be  gauged  from  the  signal-to-noise  ratio,  SNR  =  P/  (cr2),  which 
is  the  average  power  (squared-magnitude)  of  the  signal  (P)  relative  to  the  average  variance 
of  the  noise  ((cr2)),  where  the  averages  are  performed  over  all  t  and  x.  Note  that  both  P 
and  (cr2)  are  preserved  by  the  Fourier  Transform  (Papoulis,  1962;  Oppenheim  and  Schafer, 
1989),  and  therefore  the  SNR  of  STRF[t,x]  is  identical  to  that  of  MT Fst ['«' .  0]  (with  the 
averages  performed  over  w  and  0). 

With  this  in  mind,  the  signal  and  noise  components  of  the  SNR  can  be  directly  traced 
through  Eq.  (5)  to  the  response.  Here,  the  variance  of  the  MTFst  is  found  to  be 

VariMTFsrlw.a}}  -  Var{i'^  'Af^ro-  All!  ^  (6) 

a4  az 

since  r[w]  is  the  only  source  of  variance. 

Analogously,  the  squared-magnitude  (power)  of  the  MTFst  is 

\MTFst[wM\2  =  J^L.  (7) 

If  r  is  taken  to  be  the  mean  response,  this  equation  describes  the  signal  power.  If  instead  r 
denotes  the  actual  response,  then  the  resulting  MTFst  measurement  (and  equivalently,  the 
STRF  measurement)  will  be  composed  of  signal  plus  noise,  and  therefore  its  average  power 
will  exceed  P  by  (cr2),  provided  the  signal  and  noise  components  are  uncorrelated. 

In  summary,  response  variability  is  a  source  of  error  in  the  STRF  measurement.  This 
is  referred  to  as  non- systematic  error ,  since  its  exact  form  varies  from  measurement  to 
measurement.  The  expected  size  of  the  error  is  quantified  by  (cr2).  At  the  same  time,  the 
signal  power  (P)  and  response  power  are  closely  related.  Therefore,  stimuli  that  maximize  the 
response  power  relative  to  the  response  variance  will  result  more  reliable  STRF  measurements 
(higher  SNR).  Note  also  that,  in  theory,  the  SNR  of  the  STRF  measurement  could  be 
obtained  directly  from  the  response,  without  actually  computing  the  STRF. 
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2.1.4  Nonlinear  Interference 

We  have  assumed  so  far  that  a  purely  linear  relationship  exists  between  modulations  in 
the  dynamic  spectrum  and  modulations  of  the  mean  spike  rate.  In  reality,  nonlinearities 
such  as  rectification  (the  strictly  positive  nature  of  the  spike  rate)  and  synaptic  depression 
(Chance  et  al.,  1998;  Carandini  et  al.,  2002)  introduce  additional  response  components.  To 
the  extent  that  these  components  are  correlated  with  the  stimulus,  they  result  in,  systematic, 
stimulus- dependent  errors  to  the  STRF  measurement. 

A  detailed  accounting  for  various  nonlinearities  is  not  given  here.  Suffice  it  to  say  that 
a  portion  of  the  response  can  be  described  by  Eq.  (1),  and  the  remaining  nonlinear  portion 
may  be  described  by  additional  terms  in  a  Volterra  or  Wiener  functional  expansion,  which 
have  long  been  used  in  neuroscience  (Eggermont,  1993)  and  systems  theory  (Schetzen,  1980). 
The  portion  of  the  nonlinearity  manifest  at  the  odd-  and  even-numbered  terms  of  the  expan¬ 
sions  is  dubbed  odd-  and  even-ordered  nonlinearity,  respectively.  Fourier-based  descriptions 
of  the  input-output  characteristics  of  such  systems  are  already  well  studied  (e.g.,  (Victor  and 
Knight,  1979;  Victor  and  Shapley,  1980;  Boyd  et  ah,  1983)).  They  describe  how  multiple 
stimulus  frequencies  (e.g.,  spectrotemporal  modulation  frequencies)  interact  to  form  nonlin¬ 
ear  response  frequencies,  or  distortion  products.  It  is  those  distortion  products  manifested  at 
frequencies  overlapping  with  the  linear  portion  of  the  response  that  interfere  with  the  STRF 
measurement. 

Knowledge  about  the  stimulus  dependence  of  distortion  products  facilitates  the  detec¬ 
tion,  identification,  and  extraction  of  nonlinear  response  elements  (Spekreijse  and  Oosting, 
1970;  Victor  and  Shapley,  1980;  Boyd  et  ah,  1983).  For  example,  odd-  and  even-ordered  non- 
linearities  are  distinct  in  that  their  distortion  products  are  composed  of  products  of  odd  and 
even  numbers  of  stimulus  elements,  respectively.  By  straightforward  trigonometry,  one  can 
determine  the  possible  response  frequencies  that  may  be  observed  for  a  stimulus  of  known 
(or  cleverly  designed)  composition,  and  further  determine  on  how  the  amplitude  of  these 
distortion  products  will  change  if  a  gain  is  applied  to  the  stimulus. 

2.2  Application  of  the  Methodology 

We  now  detail  how  the  above  methodology  is  exploited  by  the  methods  used  in  this  study. 

2.2.1  Stimulus  Realization  and  Delivery 

A  stimulus  is  designed  by  first  specifying  its  MSst-  Recall  from  Section  2.1.2  that  the 
spectrotemporal  modulation  frequencies  contained  in  the  stimulus  are  used  to  reconstruct 
the  STRF.  Through  the  properties  of  the  Fourier  Series  described  in  Section  2.1.1,  the  set 
of  frequencies  required  for  this  construction  is  defined  by  four  parameters:  T  and  X,  the 
temporal  extent  (memory)  and  spectral  extent  (bandwidth)  of  STRF;  and  wc  and  flc,  the 
maximum  temporal  and  spectral  modulation  frequencies  in  MTFst-  For  all  results  reported 
here,  T  was  250  ms,  X  was  5  octaves,  wc  was  24  Hz,  and  fic  was  1.4  cyc/oct.  These  values 
were  chosen  a  priori  based  upon  the  likely  structure  of  STRFs  in  Al,  as  inferred  from  previous 
studies  (Kowalski  et  al.,  1996a, b;  Depireux  et  al.,  2001). 

The  requisite  set  of  modulation  frequencies  need  not  be  contained  within  a  single  stimulus; 
it  may  be  divided  among  multiple  stimuli.  Stimuli  thus  devised  are  used  to  independently 
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reconstruct  different  areas  of  the  MTFst,  which  are  finally  combined  to  form  the  complete 
measurement.  Some  benefits  of  this  scheme  include  the  reduction  of  measurement  errors  and 
the  option  of  using  short-duration  stimuli  (Klein  et  al.,  2000). 

The  MSst  design  subsequently  specifies  (via  an  inverse  Fourier  Transform)  a  desired 
or  “target”  dynamic  spectrum.  We  realized  this  target  with  a  sum  of  amplitude-modulated 
(AM)  tones  of  various  carrier  frequencies  (typically  100  tones  per  octave)  and  random  phases 
(Kowalski  et  al.,  1996a).  First,  the  target  is  scaled  so  that  its  values  he  within  ±90%  of  the 
mean  value.  The  mean  value,  which  corresponds  to  the  mean  amplitude  of  the  tones,  is  set 
10-20  dB  above  the  neuron’s  threshold  (measured  previously  with  pure  tones).  Finally,  the 
AM  pattern  of  each  tone  is  specified  by  the  cross-section  of  the  target  at  the  corresponding 
spectral  location. 

Three  types  of  stimuli  are  used  in  this  study:  dynamic-ripple  stimuli,  temporally  orthog¬ 
onal  ripple  combinations  (TORCs),  and  spectrotemporally  white  noise  (STWN).  As  exem¬ 
plified  in  Figure  2,  they  distribute  spectrotemporal  modulation  frequencies  among  stimuli 
in  different  ways.  Due  to  the  peak-amplitude  constraint  on  the  dynamic  spectra,  they  also 
employ  markedly  different  modulation- frequency  amplitudes  (a);  increasing  the  number  of 
modulation  frequencies  in  a  stimulus  (implying  more  complex  modulations)  generally  re¬ 
quires  the  amplitude  of  each  frequency  to  be  decreased  so  that  their  sum  is  contained  within 
a  given  range.  In  any  case,  the  amplitudes  of  all  modulation  frequencies  within  a  given  stim¬ 
ulus  were  identical.  If  a  stimulus  contained  multiple  modulation  frequencies,  their  phases 
were  randomly  assigned;  otherwise  they  were  (arbitrarily)  set  to  zero.  Additional  details 
about  these  stimuli  are  provided  later  in  Section  3.1. 


Aa 


Figure  2:  The  MSst  magnitudes 
are  illustrated  for  members  of  each 
of  the  three  stimulus  types  - 
dynamic-ripple  stimuli,  TORCs, 
and  STWN.  The  stimuli  all  have 
the  same  duration  (250  ms),  and 
contain  1,  6,  and  90  spectrotem¬ 
poral  modulation  frequencies,  re¬ 
spectively.  By  virtue  of  the  dy¬ 
namic  range  constraint  on  the  in¬ 
tensities  of  the  dynamic  spectrum, 
the  stimuli  must  employ  different 
modulation-frequency  amplitudes 
a.  The  employed  amplitudes,  rela¬ 
tive  to  those  of  the  STWN  stimu¬ 
lus,  are  indicated  in  parentheses. 


The  Fourier  series  endows  dynamic  spectra,  thus  designed,  with  a  common  periodicity 
of  T  =  250  ms  and  X  —  5  octaves.  One  spectral  period  was  realized  in  each  stimulus, 
whose  5-octave  bandwidth  was  centered  upon  the  neuron’s  pure-tone  tuning  curve  (measured 
previously).  The  temporal  periodicity  of  the  dynamic  spectra  was  exploited;  this  enabled 
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multiple  observations  of  the  response,  since  (assuming  the  neuron’s  memory  is  less  than  T 
seconds)  all  temporal  periods  beyond  the  first  constitute  identical  stimulus  presentations.  A 
stimulus  sweep  consisted  of  a  limited  number  (4  or  12)  of  stimulus  periods,  and  had  a  rise 
and  fall  time  of  8  ms.  Multiple  sweeps  were  presented  for  each  stimulus.  Sweeps  of  different 
stimuli,  separated  by  3-4  seconds  of  silence,  were  presented  in  a  pseudorandom  order,  until 
a  neuron  was  exposed  60-120  periods  (15-30  s)  of  each  stimulus. 

All  stimuli  were  gated  and  fed  through  an  equalizer  into  an  earphone.  Calibration  of  the 
sound  delivery  system  (to  obtain  a  flat  frequency  response  up  to  20  kHz)  was  performed  in 
situ  with  the  use  of  a  1/8  in.  Briiel  &  Kjaer  4170  probe  microphone.  The  earphone  was 
inserted  into  the  ear  canal  through  the  wall  of  the  speculum  to  within  5  mm  of  the  tympanic 
membrane.  The  speculum  and  microphone  setup  resembles  closely  that  suggested  by  Evans 
(Evans,  1979).  More  details  on  the  surgery  will  be  provided  below. 

2.2.2  Response  Measurement  and  STRF  Calculation 

Each  stimulus  resulted  in  a  collection  of  response  observations  y[t]  (i.e. ,  binned  spike  trains), 
each  member  of  which  consists  of  the  number  of  spikes  occurring  in  successive  A(  -  1  ms 
intervals  during  one  stimulus  period  (see,  e.g.,  Figure  3B).  The  total  number  of  stimulus 
periods  used  is  n.  The  transient  epochs,  during  the  first  period  of  each  sweep,  were  disre¬ 
garded;  only  the  steady-state  portion  of  the  response  is  utilized.  The  spike  rate  r[t]  was  then 

n 

estimated  from  the  sample  mean  of  y[t\/ At:  r[t]  =  -  ^  y^t]/  At,  where  y%  [f ]  is  the  response 

1=1 

to  the  ith  stimulus  period.  This  is  the  response  whose  Fourier  Transform  is  used  to  calculate 
the  MTFst  (and  subsequently  the  STRF),  or  some  portion  thereof,  via  Eq.  (5).  These 
calculations  are  very  simple  and  are  completed  in  MATLAB  (Mathworks)  in  a  fraction  of  a 
second. 

2.2.3  Reducing  Nonlinear  Interference  with  the  Inverse-Repeat  Method 

In  this  article,  we  concentrate  on  even-ordered  nonlinearities;  they  are  ubiquitous  in  the  brain 
(e.g.,  due  to  rectification),  and  can  severely  distort  the  reverse-correlation  measurement, 
particularly  when  the  stimulus  is  brief  (Swerup,  1978).  Fortunately,  its  ill  effects  are  easily 
isolated  and  extracted  by  the  inverse-repeat  method  (Moller,  1977;  Wickesberg  and  Geisler, 
1984).  In  its  simplest  form,  this  method  calls  for  two  stimuli  (here,  dynamic  spectra)  that 
sum  to  a  constant  value.  While  the  linear  responses  to  the  two  stimuli  are  opposite  in  sign, 
the  even-ordered  distortion  products  are  identical  (Victor  and  Shapley,  1980).  Therefore, 
the  even-ordered  effects  are  removed  by  subtracting  the  two  responses  and  dividing  by  two 
(or  instead  isolated  by  adding  the  responses).  This  method  is  investigated  in  conjunction 
with  TORC  stimulation. 

2.2.4  Signal  and  Noise  Calculations 

As  mentioned  in  Section  2.1.3,  the  measures  of  signal  power  (P)  and  noise  variance  ((cr2)), 
and  therefore  the  SNR,  apply  to  both  STRF[t,  x ]  and  MTFst[w,  12],  For  a  single  stimulus- 
response  pair,  a  simple  relationship  was  identified  in  Eq.  (6)  between  the  variance  of 
MTFsT[w,fl]  and  the  variance  of  r[w\.  Note  that  latter  variance  is,  in  turn,  proportional 
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to  the  variance  of  y[w\ 
specifically, 


(the  Fourier  Transform  of  the  response  to  one  stimulus  period); 


V ar  {f  [w]} 


War  {i/[w]} 
n  A2 


(8) 


Thus,  the  variance  of  MTFst[w,  0]  could  be  quickly  estimated  from  the  sample  variance  of 
y[w]  (across  all  stimulus  periods),  without  repeating  the  experiment  or  subdividing  the  data. 

However,  the  MTFst  measurement  may  incorporate  the  measurements  from  multiple 
stimulus- response  pairs;  if  so,  its  variance  will  depend  on  how  the  individual  measurements 
are  combined.  If  a  point  on  MTFst[u>,  D]  is  the  average  of  N  measurements,  then  its  variance 
will  simply  tend  to  scale  by  1  /N  with  respect  to  that  of  an  individual  measurement.  But  more 
complicated  functions  of  the  individual  measurements  (such  as  that  used  for  the  dynamic- 
ripple  stimuli  (Depireux  et  al.,  2001))  may  obscure  the  relation  between  the  variance  of 
the  MTFst  and  that  of  the  constituent  responses.  In  such  a  case,  the  bootstrap  method 
may  be  employed.  This  method  simulates  the  randomness  of  a  statistic  that  is  a  function 
of  a  collection  of  identical  observations,  without  repeating  the  experiment  or  subdividing 
the  observations  (Efron  and  Tibshirani,  1993;  Politis,  1998).  In  the  present  context,  a  new 
MTFst  is  computed  from  a  new,  identical-sized  collection  of  y[t],  assembled  by  selecting 
members  of  the  original  collection  randomly  and  with  replacement.  The  sample  variance  of 
the  MTFst,  or  some  function  thereof,  is  calculated  after  repeating  the  process  many  times 
(we  used  300),  which  is  feasible  due  to  the  simplicity  of  the  computations. 

For  the  sake  of  equal  footing,  we  used  the  bootstrap  method  to  estimate  the  variance  of 
the  MTFst  for  all  stimulus  types.  After  subsequently  calculating  ( a 2),  the  SNR  was  inferred 
from  the  average  power  of  the  MTFst,  which,  as  mentioned  in  Section  2.1.3,  approximately 
equals  P  +  (a2). 


2.2.5  Including  Systematic  Measurement  Errors  in  the  Signal-to-Noise  Ratio 

The  SNR  quantifies  the  size  of  the  signal  compared  to  the  size  of  the  non- systematic  compo¬ 
nent  of  the  measurement  error.  However,  the  possible  additional  contribution  of  systematic 
errors  —  that  is,  those  induced  nonideal  stimulus  structure  (i.e.,  e  in  Eq.  4)  and  by  nonlin¬ 
earities  —  cause  the  actual  error  level  of  the  STRF  measurement  to  exceed  that  described 
by  the  SNR.  The  present  results  present  an  opportunity  to  obtain  a  more  “correct”  measure 
of  the  SNR,  provided  that  all  errors  are  evenly  distributed  over  the  measurement,  because 
the  signal  tends  to  be  concentrated  in  an  early  region  of  the  measurement  between  0  and  125 
ms.  Accordingly,  a  corrected  SNR  measure,  SNRcor,  was  obtained  after  dividing  the  average 
power  of  the  early  region  by  the  average  power  of  the  late  (post  125  ms)  region.  Note  that 
SNRcor  should  be  less  than  or  equal  to  SNR  (modulo  the  inaccuracies  in  measuring  SNR 
and  SNRcor ),  with  equality  when  there  are  no  systematic  errors. 


2.2.6  Error  Reduction  with  the  Singular- Value  Decomposition 

To  further  reduce  errors  in  the  STRF  measurement,  we  investigated  the  singular- value  de¬ 
composition  (SVD),  applied  to  either  STRF[t,x]  or  MTFsT[w,St]  (which  are  both  just 
matrices  of  numbers).  The  SVD  is  a  well-studied  tool  for  resolving  the  structure  of  matrices 
that  are  corrupted  by  errors  (Stewart,  1993;  Hansen,  1998).  It  works  by  breaking  up  an 
arbitrary  matrix  into  a  sum  of  separable  matrices,  which,  in  the  current  context,  are  each 
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formed  by  the  product  of  one  temporal  vector  and  one  spectral  vector.  The  first  matrix 
takes  the  best  separable  approximation  out  of  the  original  matrix;  the  second  takes  the  best 
separable  approximation  out  of  the  remainder,  and  so  on.  The  importance  of  each  separable 
matrix  is  gauged  by  its  singular  value,  which  is  the  square  root  of  its  average  power.  The 
total  number  of  separable  matrices  required  to  describe  a  matrix  (the  number  of  nonzero 
singular  values)  is  called  the  matrix’s  rank. 

A  basic  theorem  (Stewart,  1991)  implies  that  if  the  error-free  STRF  can  be  well  approx¬ 
imated  by  only  a  few  separable  matrices,  then  the  addition  of  small  and  evenly  distributed 
errors  will  only  slightly  perturb  their  form,  as  they  constitute  the  first  few  matrices  in  the 
SVD  of  the  STRF  measurement.  The  additional  and  subsequent  matrices  required  to  de¬ 
scribe  the  measurement  will  describe  mostly  errors,  and  thus  should  be  discarded.  In  fact, 
there  are  a  priori  reasons  to  believe  that  STRFs  are  well  approximated  by  low-rank  ma¬ 
trices.  Typically,  cortical  STRFs  are  localized  in  a  compact  area  of  the  spectrotemporal 
domain  and  the  modulation- frequency  domain  (Depireux  et  al.,  2001;  Miller  et  al.,  2002); 
this  alone  will  limit  their  rank.  Still  lower  limits  will  be  imposed  by  special  structure  within 
the  STRF  or  the  MTFst,  such  as  spectral-temporal  separability  (Eggermont  et  al.,  1981; 
Depireux  et  al.,  2001;  Sen  et  al.,  2001),  quadrant  separability  (Depireux  et  al.,  2001),  and 
temporal  symmetry  (Simon  et  al.,  subm). 

In  practice,  determining  which  separable  matrices  should  be  discarded  is  a  complex  prob¬ 
lem  (Stewart,  1993;  Hansen,  1998).  Most  approaches  use  knowledge  or  assumptions  about 
the  size  and  structure  of  the  errors  to  bound  the  singular  values  (or  functions  thereof)  of  those 
separable  matrices  describing  mostly  errors.  Through  simulations,  we  found  that  methods 
based  solely  on  variability  analysis  tended  to  underestimate  the  size  of  the  errors;  instead, 
the  most  generally  accurate  methods  gauged  the  error  level  directly  from  the  post- 125- ms 
region  of  the  STRF  measurement  (for  a  similar  method  see  (Sen  et  al.,  2001)).  We  used  the 
largest  singular  value  from  this  region  (or  its  Fourier  Transform)  to  threshold  the  singular 
values  of  the  pre-125-ms  region  (or  its  Fourier  Transform).  In  theory,  the  STRF  (or  MTFst) 
is  optimally  approximated  using  only  those  separable  matrices  with  singular  values  above 
this  threshold,  and  discarding  the  remainder. 

Although  this  approximation  is  in  some  sense  optimal,  it  is  still  prone  to  error.  As  the 
error  level  increases,  more  and  more  error  leaks  into  the  approximation  and,  conversely,  more 
and  more  of  the  STRF  power  is  lost  under  the  error  threshold  (Hansen,  1998).  This  second 
case  is  of  primary  interest  in  this  study;  we  will  gauge  the  proportion  of  (error-free)  STRF 
power  excluded  from  the  SVD  approximation.  A  naive  gauge  of  this  is  asvD ,  the  proportion 
of  the  STRF  measurement’s  power  contained  in  the  SVD  remainder  (Depireux  et  al.,  2001). 
Unfortunately,  when  the  level  of  measurement  error  is  high,  agvD  itself  will  be  inflated, 
because  much  of  the  remainder  will  consist  of  error.  However,  we  can  use  the  bootstrap 
method  to  estimate  the  size  (average  variance)  of  the  part  of  the  remainder  resulting  from 
non-systematic  errors,  and  subtracted  it  out.  This  leads  to  a  more  accurate  gauge  of  the 
proportion  of  lost  STRF  power,  particularly  when  the  systematic  errors  are  small:  Psvd,  the 
average  power  of  the  systematic  component  of  the  remainder,  divided  by  P.  In  Section  3.4, 
we  use  asvD  and  (3svd  together  to  study  how  measurement  errors  effect  the  performance  of 
the  SVD. 
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2.2.7  STRF  Comparisons 

In  this  article,  the  correlation  coefficient  is  used  to  quantify  the  similarity  between  two 
different  STRF  measurements.  This  takes  values  between  —1  and  +1,  with  +1  indicating 
a  perfect  match.  Comparisons  are  made  over  the  first  125  ms  of  the  measurements,  both 
before  and  after  the  SVD  is  applied.  Note  that  the  correlation  coefficients  for  the  pre-SVD 
comparisons  will  be  limited  by  SNRcor ;  if  two  identical  STRFs  are  corrupted  by  independent 
and  identically  distributed  errors,  the  correlation  coefficient  should  approximately  equal 
SNRcor/ (SNRcor  + 1).  To  the  extent  that  the  SVD  approximations  result  in  increased  SNRs, 
they  will  allow  for  higher  correlation  coefficients,  which  we  modeled  as  gSNRcor/ (gSNRcor  + 
1),  where  g  represents  a  multiplicative  gain  in  SNRcor. 

2.2.8  Simulations 

Simulations  were  employed  in  order  to  verify  the  performance  of  these  methods  under  realistic 
conditions.  The  core  of  a  simulation  is  an  STRF  (tailor-made  or  derived  from  a  low-rank 
approximation  of  an  actual  measurement)  and  a  set  of  stimuli.  The  STRF-based  responses 
to  the  stimuli  are  computed  via  Eq.  (3).  These  responses  are  then  altered;  usually  they 
are  rectified  and  then  subjected  to  another  static  nonlinearity,  such  as  a  squaring  function. 
The  result,  representing  the  time-varying  spike  rate,  is  used  to  create  spike  trains  with 
inhomogeneous  Poisson  statistics  (Berry  and  Meister,  1998;  Oram  et  al.,  1999),  with  a  time 
step  of  50  g, s.  These  spike  trains  are  treated  as  the  responses  of  a  neuron  with  an  unknown 
STRF,  and  are  subjected  to  the  very  same  analyses  as  the  real  responses.  Wherever  the 
bootstrap  method  was  employed,  its  expected  performance  was  simulated  against  a  Monte- 
Carlo  procedure,  employing  300  sets  of  independent  responses  with  identical  spike  rates. 

2.2.9  Surgery  and  animal  preparation 

Data  were  collected  from  16  domestic  ferrets  ( Mustela  putorius )  supplied  by  Marshall  Farms 
(Rochester,  NY).  The  ferrets  were  anesthetized  with  sodium  pentobarbital  (40  mg/kg)  and 
maintained  under  deep  anesthesia  during  the  surgery.  Once  the  recording  session  started,  a 
combination  of  Ketamine  (8  mg/Kg/Hr),  Xylazine  (1.6  mg/Kg/Hr),  Atropine  (10  /rg/Kg/Hr) 
and  Dexamethasone  (40  /ig/Kg/Hr)  was  given  throughout  the  experiment  by  continuous 
intravenous  infusion,  together  with  Dextrose,  5%  in  Ringer  solution,  at  a  rate  of  1  cc/Kg/Hr, 
to  maintain  metabolic  stability.  The  ectosylvian  gyrus,  which  includes  the  primary  auditory 
cortex,  was  exposed  by  craniotomy  and  the  dura  was  reflected.  The  contralateral  ear  canal 
was  exposed  and  partly  resected,  and  a  cone-shaped  speculum  containing  a  miniature  speaker 
(Sony  MDR-E464)  was  sutured  to  the  meatal  stump.  For  more  details  on  the  surgery  see 
(Shamma  et  ah,  1993). 

2.2.10  Recordings,  spike  sorting,  and  selection  criteria 

Action  potentials  from  single  units  were  recorded  using  glass-insulated  tungsten  microelec¬ 
trodes  with  5-7  M12  tip  impedance  at  1  kHz.  In  each  animal,  electrode  penetrations  were 
made  orthogonal  to  the  cortical  surface.  In  each  penetration,  cells  were  typically  isolated  at 
depths  of  350-600  /im  corresponding  to  cortical  layers  III  and  IV  (Shamma  et  al.,  1993).  In 
12  animals,  neural  signals  were  fed  through  a  window  discriminator  and  the  time  of  spike 
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occurrence  relative  to  stimulus  delivery  was  stored  using  a  computer.  In  the  other  4  animals, 
the  neural  signals  were  stored  for  further  processing  offline.  Using  MATLAB  software  de¬ 
signed  in-house,  action  potentials  were  then  manually  classified  as  belonging  to  one  or  more 
distinct  neurons,  and  the  spike  times  for  each  neuron  were  recorded.  The  action  potentials 
assigned  to  a  single  neuron  met  the  following  criteria:  (1)  the  peaks  of  the  spike  waveforms 
exceeded  4  times  the  standard  deviation  of  the  entire  recording;  (2)  each  spike  waveform 
was  less  than  2  ms  in  duration  and  consisted  of  a  clear  positive  deflection  followed  immedi¬ 
ately  by  a  negative  deflection;  (3)  the  spike  waveforms  were  not  visibly  different  from  each 
other,  modulo  the  noise;  (4)  the  histogram  of  inter-spike-intervals  evidenced  a  minimum  time 
between  spikes  (refractory  period)  of  at  least  1  ms.  This  procedure  occasionally  produced 
units  with  very  low  spike  counts.  After  consulting  the  distribution  of  spike  counts  for  all 
units,  units  that  fired  fewer  than  one  spike  per  two  seconds  of  stimulation  were  excluded 
from  further  analysis. 

Analysis  of  the  dynamic-ripple  recordings  was  published  previously  (Depireux  et  al., 
2001).  We  used  here  the  same  selection  criteria  for  those  recordings  that  were  used  in  that 
study.  Those  criteria  were  somewhat  more  stringent  than  those  used  for  the  TORC  and 
STWN  recordings;  consequently,  there  are  conspicuously  fewer  instances  of  low-SNR  STRFs 
and  low  spike  counts  in  the  dynamic-ripple  results,  with  respect  to  the  TORC  and  STWN 
results. 


3  Results 

The  results  of  this  study  are  presented  as  follows.  In  Section  3.1,  we  detail  the  measurement 
of  a  neuron’s  STRF  using  each  of  the  three  stimulation  types,  and  we  subsequently  illustrate 
the  computation  of  the  SVD-based  STRF  approximations.  In  Section  3.2,  for  neurons  whose 
STRFs  were  measured  with  multiple  stimulus  types,  we  examine  the  similarity  between  the 
multiple  measurements  and  the  corresponding  SVD  approximations,  as  a  function  of  the 
level  of  measurement  error.  In  Section  3.3,  we  analyze  the  origins  and  stimulus  dependence 
of  the  measurement  errors.  Finally,  in  Section  3.4,  we  study  how  measurement  errors  effect 
the  sufficiency  of  the  SVD  approximations. 

3.1  Overview 

In  this  section,  we  detail  the  measurement  of  a  neuron’s  STRF  using  dynamic-ripple  stimuli 
(Figure  3),  TORCs  (Figure  4),  and  STWN  (Figure  5),  respectively.  The  MSst  magnitudes 
for  examples  of  each  of  these  stimulus  types  are  illustrated  in  Figure  2.  The  respective 
STRF  measurements  are  denoted  STRFdr,  STRFTorc,  and  STRFstwn-  Computation  of 
the  SVD-based  approximations  of  the  measurements  is  subsequently  detailed. 

3.1.1  Dynamic- Ripple  Stimuli 

For  the  dynamic-ripple  stimuli  (Kowalski  et  al.,  1996a;  Depireux  et  ah,  2001)  shown  in  Fig¬ 
ure  3,  each  stimulus  is  composed  of  a  single  spectrotemporal  modulation  frequency  (Fourier 
component).  It  can  therefore  be  considered  the  auditory  equivalent  to  the  drifting  sinu¬ 
soidal  luminance  gratings  used  in  visual  neuroscience  (De  Valois,  R.L.  and  De  Valois,  K.K., 
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1990).  Figure  3A  shows  the  dynamic  spectrum  of  one  such  stimulus,  which  has  a  temporal 
modulation  rate  w  of  —8  Hz  and  a  spectral  modulation  rate  Q  of  0.2  cyc/oct. 

The  response  (r[t])  to  this  stimulus  (B  through  D)  exhibits  both  linear  and  nonlinear 
aspects,  as  well  as  variability.  According  to  the  linear  model  of  Eq.  (3),  the  response  should 
be  a  pure  8  Hz  sinusoid,  with  amplitude  and  phase  determined  by  MTFst[ 8,0.2].  Clearly, 
r[t]  (C:  blue)  is  modulated  at  8  Hz,  but  it  also  contains  nonlinear  components.  The  (Discrete) 
Fourier  Transform  of  r[t]  (r  [u>] ,  shown  in  D)  makes  this  explicit:  In  addition  to  a  prominent 
8  Hz  component  (in  red),  distortion  products  (in  green)  with  frequencies  of  0  Hz  (the  “DC” 
or  average  of  r[t]  over  t)  and  16  Hz  are  plainly  visible.  Given  the  stimulus  composition,  these 
distortion  products  betray  the  presence  of  2nd-order,  and  possibly  0th-order  (“spontaneous” 
activity),  nonlinearity  (both  of  which  are  even-ordered).  With  respect  to  the  linear  plus  DC 
description  (C:  red  curve),  including  the  16  Hz  distortion  product  (C:  black)  better  accounts 
for  the  sharpness  and  non-negative  nature  of  the  response. 

The  remaining  portion  of  the  response  looks  like  noise.  It  is  the  manifestation  of  the 
period-to-period  response  variability  evident  in  B.  In  the  Fourier  Transform  (D),  it  takes  the 
form  of  a  shallow  baseline  of  energy  that  extends  over  all  frequencies.  Note  that  the  square- 
root  of  the  response  variance  (i.e.,  the  standard  error),  calculated  via  Eq.  8,  is  similarly 
distributed  over  the  components  of  r[w]  (D:  black  curve). 

The  existence  of  response  components  due  to  nonlinearity  and  variability  does  not  nec¬ 
essarily  imply  that  they  interfere  with  the  STRF  measurement.  Since  the  stimulus  consists 
of  a  single  spectrotemporal  modulation  frequency  with  a  temporal  component  of  8  Hz,  only 
the  8  Hz  component  of  the  response  is  correlated  with  the  stimulus,  and  e  in  Eq.  (4)  is  zero. 
The  only  source  of  error  is  the  portion  of  the  nonlinearity  and  variability  that  happens  to 
be  manifest  at  8  Hz.  The  reverse-correlation  result  is  shown  in  Figures  3E  and  F.  It  consists 
of  a  single  spectrotemporal  modulation  frequency,  corresponding  to  the  8  Hz,  0.2  cyc/oct 
component  of  the  STRF.  STRFdr  is  thus  assembled;  the  result  after  two  stimuli  and  all 
stimuli  is  shown  in  G  and  H,  and  I  and  J,  respectively.  It  is  important  to  note  that,  due  to 
time  constraints,  these  point- by-point  measurements  of  the  MTFst  were  restricted  to  two 
cross-sections,  as  indicated  by  the  gray  outlines  in  I.  The  full  MTFst  was  then  constructed 
from  a  normalized  outer  product  of  these  cross-sections  (Depireux  et  al.,  2001). 

3.1.2  Temporally  Orthogonal  Ripple  Combinations 

In  contrast  to  the  dynamic-ripple  stimuli,  the  TORC  stimuli  (Klein  et  ah,  2000)  can  directly 
measure  the  entirety  of  the  MTFst,  because  each  stimulus  is  used  to  measure  multiple 
points  at  once.  The  stimuli  are  necessarily  more  complex,  containing  six  spectrotemporal 
modulation  frequencies  (Fourier  components)  each.  However,  no  two  Fourier  components  in 
a  given  stimulus  share  the  same  value  of  |tc|  (they  are  temporally  orthogonal;  their  temporal 
correlation  is  zero);  therefore,  each  spectrotemporal  modulation  frequency  in  the  stimulus 
will  evoke  a  different  temporal  frequency  in  the  linear  part  of  the  response. 

The  dynamic  spectrum  of  one  TORC  is  shown  in  Figure  4A.  It  is  composed  of  six  spec¬ 
trotemporal  modulation  frequencies  having  the  same  of  0.2  cyc/oct,  but  different  w’s 
spanning  the  range  of  4  to  24  Hz.  The  associated  response  (B:  in  blue)  exhibits  a  complex 
modulation  of  the  spike  rate.  The  smoothed  response,  obtained  by  discarding  the  irrelevant 
frequencies  above  24  Hz,  is  superimposed  in  red.  In  C  is  a  more  accurate  view  of  the  lin- 
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Figure  3:  Measuring  the  STRF 
with  dynamic-ripple  stimuli.  A: 
Dynamic  spectrum  of  a  dynamic- 
ripple  stimulus  with  w  =  —8  Hz 
and  D  =  0.2  cyc/oct.  90  stim¬ 
ulus  periods  were  used.  B:  Ras¬ 
terized  spike  trains,  i/i  [t] ,  the  ith 
row  indicating  with  black  dots  the 
spike  times  during  the  ith  stimu¬ 
lus  period,  recorded  with  1  ms  ac¬ 
curacy.  C:  Time-dependent  spike 
rate  estimate,  r[t\:  Raw  estimate 
(blue)  (using  At  =  1  ms),  lin¬ 
ear  (8  Hz)  plus  DC  (0  Hz)  ap¬ 
proximation  (red),  and  the  ap¬ 
proximation  obtained  by  includ¬ 
ing  the  (even-ordered)  16  Hz  dis¬ 
tortion  product  (black).  D:  Re¬ 
sponse  Fourier  Transform  magni¬ 
tude,  clearly  showing  the  linear  8 
Hz  component  (red),  nonlinear  dis¬ 
tortion  products  (green),  and  the 
remaining  noise  component  (blue) . 
Also  shown  is  the  square-root  of 
the  response  variance  (the  stan¬ 
dard  error)  as  a  function  of  fre¬ 
quency  (black).  E,  F:  The  mea¬ 
surements  of  MTFst  and  STRFdr 
after  one  stimulus-response  pair. 
G,  H:  Same,  after  two  stimuli. 
I,  J,  Same,  after  all  30  stimuli. 
The  grey  outlines  in  I  indicate  the 
cross-sections  of  the  MTFst  that 
were  directly  measured. 


ear  part  of  the  response,  which  was  obtained  from  the  inverse-repeat  procedure.  It  is  very 
similar  to  the  response  predicted  by  STRFdr  (Figure  3J),  which  is  plotted  in  dashed  black. 
The  Fourier  Transform  of  the  response  (D)  confirms  the  strong  presence  of  the  4  to  24  Hz 
components  (in  red)  expected  from  the  linear  model.  However,  with  respect  to  the  noise 
baseline,  the  response  is  weaker  than  it  was  for  the  above  dynamic-ripple  stimulus. 

In  the  reverse-correlation  operation,  the  4  Hz  response  component  is  orthogonal  to  all 
stimulus  components  besides  the  4  Hz  component,  the  8  Hz  response  component  is  correlated 
only  with  the  8  Hz  stimulus  component,  and  so  on;  e  is  again  zero.  The  result  is  shown  in 
Figures  4E  and  F.  Already  after  the  first  stimulus,  six  components  of  STRFTqrc  are  in 
place.  The  corresponding  results  after  the  second  stimulus  (employing  a  different  set  of 
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Figure  4:  Measuring  the  STRF 
with  TORCs.  A:  Dynamic  spec¬ 
trum  of  a  TORC  with  Q  =  0.2 
cyc/oct  and  re’s  between  4  and  24 
Hz.  75  stimulus  periods  were  used. 
B,  C:  Time-dependent  spike  rate 
estimate,  r[t],  prior  to  (B)  and  af¬ 
ter  (C)  the  inverse-repeat  proce¬ 
dure:  Raw  estimate  (blue),  lin¬ 
ear  plus  DC  approximation  (red) 
obtained  by  discarding  frequencies 
above  24  Hz,  and  the  response 
predicted  from  the  previously  ob¬ 
tained  STRFdr  (dashed  black  in 
C).  D:  Response  Fourier  Trans¬ 
form  magnitude,  clearly  showing 
the  linear  4-24  Hz  components 
(red)  and  the  remaining  noise  com¬ 
ponent  (blue).  Also  shown  is  the 
square-root  of  the  response  vari¬ 
ance  (the  standard  error)  as  a 
function  of  frequency  (black).  E, 
F:  The  measurements  of  MTFst 
and  STRFtorc  after  one  inverse- 
repeat  pair  of  stimuli.  G,  H:  Same, 
after  two  pairs  of  stimuli.  I,  J, 
Same,  after  all  15  pairs  of  stimuli 
(30  total). 


spectrotemporal  modulation  frequencies),  and  after  all  stimuli,  are  shown  in  G  and  H,  and 
I  and  J,  respectively.  The  final  result  bears  a  striking  resemblance  to  STRFDr,  despite  the 
drop  in  both  SNR  and  SNRcor.  This  indicates  that  the  linear  spectrotemporal  processing 
of  the  neuron  is  robust  to  changes  in  the  spectrotemporal  modulation  frequency  content  of 
stimuli. 

3.1.3  Spectrotemporally  White  Noise 

If  a  dynamic-ripple  stimulus  contains  the  simplest  spectrotemporal  pattern,  the  most  com¬ 
plex  is  contained  in  spectrotemporally  white  noise  (STWN);  its  MSst  contains  all  spec- 
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trotemporal  modulation  frequencies  (Fourier  components)  with  equal  amplitudes  and  uni¬ 
formly  distributed  phases.  The  typically  poor  efficacy  of  such  stimuli  can  be  improved 
somewhat  by  limiting  the  MSst  to  a  relevant  range  of  spectral  and  temporal  modulation 
frequencies  (Klein  et  al.,  2000).  Figure  5A  shows  the  dynamic  spectrum  of  one  such  stim¬ 
ulus,  which  contained  all  re’s  between  4  and  24  Hz  and  all  STs  between  0  and  1.4  cyc/oct. 
Although  the  response  shown  in  B  is  quite  a  bit  weaker  than  those  observed  in  Figures  3  and 
4,  when  smoothed  (red)  it  is  still  comparable  to  the  linear  predictions  from  both  STRFdr 
(dashed  black)  and  STRFtorc  (dashed  green);  this  is  despite  the  fact  that,  as  shown  in  C, 
the  4  to  24  Hz  response  frequencies  predicted  by  the  linear  model  are  barely  distinct  over 
the  noise  baseline. 

This  reverse-correlation  scenario  differs  from  that  of  the  other  two  stimulus  types.  Each 
of  the  linear  response  frequencies  is  now  the  sum  effect  of  multiple  Fourier  components  of 
the  stimulus  sharing  the  same  temporal  modulation  frequency.  Every  response  frequency 
will  in  turn  be  correlated  with  each  of  the  stimulus  components  sharing  the  same  temporal 
modulation  frequency.  It  is  therefore  not  initially  clear  which  stimulus  component  caused 
what  component  of  the  response;  all  points  on  the  MTF§t  corresponding  to  a  given  w  cannot 
be  distinguished.  This  ambiguity  manifests  itself  in  the  form  of  a  large  e,  as  evident  in  Figures 
5D  and  E. 

Because  e  is  dependent  upon  the  (randomly  assigned)  phases  of  the  MSst,  it  has  an 
incoherent  structure  that  is  distributed  over  the  entire  measurement,  and  its  strength  can 
be  reduced  by  averaging  the  results  from  multiple  stimuli  with  different  phases  (or  by  using 
more  finely  spaced  re’s,  i.e. ,  increasing  the  base  stimulus  period  T )  (Klein  et  al.,  2000).  This 
argument  also  applies  to  the  manifestations  of  variability  and  even-ordered  nonlinearity 
(some  odd-ordered  distortion  products  are  however  not  dependent  on  the  phases  of  the 
stimulus  frequencies  (Victor  and  Shapley,  1980)).  The  result  obtained  after  averaging  the 
results  from  30  different  stimuli  is  shown  in  F  and  G  (approximately  the  same  result  would 
be  obtained  by  extending  T  by  a  factor  of  30).  Despite  a  further  decrease  in  SNR  and 
SNRcor,  its  similarity  to  STRFdr  (Figure  3J)  and  STRFTorc  (4J)  is  impressive;  the  linear 
spectrotemporal  processing  of  the  neuron  has  maintained  its  form  for  more  than  an  hour, 
over  vastly  different  stimulus  types. 

3.1.4  Application  of  the  Singular- Value  Decomposition 

In  this  section,  we  demonstrate  the  use  of  the  SVD  for  producing  approximations  of  the 
measurements  of  the  STRF  and  MTFst-  Such  approximations  represent  an  optimal  trade¬ 
off  between  error  reduction  and  signal  loss,  provided  the  errors  are  evenly  evenly  distributed 
over  the  measurements  (Stewart,  1993;  Hansen,  1998).  The  proportion  of  signal  lost  is  gauged 
by  Psvd  (see  Methods). 

The  SVD  of  the  STRFtorc  from  Figure  4J  (again  shown  in  Figure  6 A)  is  illustrated 
in  Figures  6B  through  D.  The  singular  values  of  the  first  12  separable  matrices  from  the 
SVD  are  shown  in  B,  along  with  the  error-derived  threshold  (see  Methods)  indicated  by  the 
dashed  line.  The  first  singular  value,  corresponding  to  the  separable  matrix  in  C,  towers 
over  the  others,  and  alone  exceeds  the  threshold.  The  STRF  is  well  described  by  this 
separable  matrix,  while  the  sum  of  the  remaining  separable  matrices,  shown  in  D,  consists  of 
unstructured  measurement  errors.  Indeed,  /3svd  =  4.8%,  indicating  that  more  than  95%  of 
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Figure  5:  Measuring  the  STRF  with  STWN.  A:  Dy¬ 
namic  spectrum  of  a  STWN  with  $Ts  between  0.2  and 
1.4  cyc/oct  and  re’s  between  4  and  24  Hz.  75  stimu¬ 
lus  periods  were  used.  B:  Time-dependent  spike  rate 
estimate,  r[t\:  Raw  estimate  (blue),  the  linear  plus  DC 
approximation  (red),  and  the  response  predicted  from 
STRFdr  (dashed  black)  and  STRFtorc  (dashed  green). 
C:  Response  Fourier  Transform  magnitude.  The  linear 
4-24  Hz  components  (red)  are  barely  distinct  from  the 
noise  (blue).  Also  shown  is  the  square-root  of  the  re¬ 
sponse  variance  (the  standard  error)  as  a  function  of  fre¬ 
quency  (black).  D,  E:  The  measurements  of  MTFst  and 
STRFstwn  after  one  stimulus.  F,  G:  Same,  averaged 
over  30  stimuli. 
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the  STRF  power  is  captured  by  this  rank-1  approximation.  That  is,  in  large  part  this  STRF 
represents  the  product  of  independent  spectral  and  temporal  integration. 

In  contrast,  the  SVD  of  a  different  neuron’s  STRFtorc  is  shown  in  Figures  6E  through 
J.  This  STRF  (in  E)  does  not  look  separable;  for  inputs  at  different  tonotopic  locations  x, 
the  temporal  integration  by  the  neuron  (in  its  network)  is  not  related  by  a  simple  scaling 
of  the  same  function.  In  this  case,  the  second  singular  value  (in  F)  also  protrudes  above 
the  threshold,  the  rank- 1  approximation  (G)  fails  to  describe  the  STRF’s  obliqueness,  and 
ft svo  is  high  at  28.2%.  After  including  the  second  separable  matrix  (shown  in  H),  the 
approximation  (in  I)  is  vastly  improved  ( ftsw  =  6.7%),  and  the  remainder  (J)  again  chiefly 
consists  of  unstructured  errors. 

The  SVD  can  alternatively  be  applied  to  the  MTFst-  While  the  SVD  of  the  full  MTFst 
yields  an  approximation  identical  to  that  of  the  STRF,  applying  the  SVD  separately  to 
each  of  the  quadrants  of  the  MTFst  will  generally  produce  a  different  approximation.  This 
procedure  is  of  interest  chiefly  because  previous  studies  (using  dynamic-ripple  stimulation) 
have  suggested  that  the  MTFst ’s  of  AI  neurons  are  well  described  as  being  quadrant-separable 
(Kowalski  et  al.,  1996b;  Depireux  et  ah,  2001),  implying  that  the  SVD  of  each  quadrant  of 
the  MTFst  should  yield  at  most  one  separable  matrix  of  significance.  Therefore,  if  the  STRF 
is  not  separable,  it  could  be  advantageous  (in  terms  of  error  reduction)  to  approximate  the 
STRF  in  this  manner.  This  principle  is  examined  in  Figure  7,  using  the  non-separable  STRF 
from  Figure  6E.  The  SVD  of  each  of  the  upper  two  quadrants  of  the  MTFst  shown  in  7B 
yields  the  two  sets  of  singular  values  in  C.  In  each  quadrant,  only  the  first  singular  value 
is  pronounced  and  exceeds  the  threshold.  This  indication  that  the  quadrants  are  indeed 
separable  is  supported  upon  comparison  of  the  original  STRF  (in  A)  with  the  quadrant- 
separable  approximation  (for  which  ftsvD  =  6.6%)  and  the  remainder,  shown  in  D  and  E, 
respectively.  Intriguingly,  the  result  is  markedly  similar  to  the  rank- 2  approximation  of  the 
STRF  from  Figure  6J.  By  implication,  the  MTFst  from  the  rank-2  approximation  (shown 
in  H)  is  very  similar  that  from  the  quadrant-separable  approximation  (in  F).  The  Fourier 
Transforms  of  the  corresponding  remainders  (in  G  and  I)  are  also  very  similar. 

In  summary,  we  have  demonstrated  the  use  of  the  SVD  for  producing  relatively  error-free 
approximations  of  the  STRF  or  MTFst  measurements.  Later,  in  Section  3.4,  we  will  examine 
how  well  these  three  types  of  approximations  —  the  rank- 1,  rank-2,  and  quadrant-separable 
approximations  —  apply  to  the  whole  of  the  neuronal  population,  as  a  function  of  the  error 
level  and  the  type  of  stimulation. 

3.2  Direct  Comparisons  of  STRFs  Measured  with  Different  Stim¬ 
ulus  Types 

In  45  out  of  308  neurons  whose  STRFs  we  measured,  we  obtained  multiple  STRF  mea¬ 
surements  using  two  or  all  three  stimulus  types.  The  resemblance  between  the  first  125  ms 
of  each  pair  of  measurements  was  quantified  by  the  correlation  coefficient  (see  Methods), 
which  was  computed  under  four  conditions:  for  the  raw  (pre-SVD)  measurements,  and  for 
the  quadrant-separable,  rank- 2,  and  rank-1  approximations  of  the  measurements. 

The  correlation  coefficients  from  the  raw  comparisons  are  plotted  in  Figure  8 A  versus  the 
limiting  (minimum)  SNRcor  of  the  two  measurements.  The  squares,  triangles,  and  circles 
correspond  to  the  three  possible  pairs  of  stimulus  types  compared.  The  trends  followed  by 
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Figure  6:  Approximating  the  STRF  with  the  SVD.  A-D:  An 
STRF  that  looks  separable.  A:  The  original  measurement,  B: 
The  singular  values  (bars)  of  the  separable  matrices  of  the  SVD, 
and  the  error-derived  threshold  (dashed  line).  C,  D:  The  rank- 
1  approximation  and  the  remainder.  E-J:  An  STRF  that  does 
not  look  separable.  F:  The  singular  values  (bars)  and  threshold 
(dashed  line).  G:  The  rank- 1  approximation.  H:  The  second- 
separable  matrix.  I,  J:  The  rank-2  approximation,  and  the  re¬ 
mainder. 


all  stimulus  comparisons  are  similar.  When  SNRcor  is  above  1,  the  correlation  coefficients 
are  high  and  are  weakly  affected  by  SNRcor.  The  correlation  coefficients  are  only  small  when 
SNRcor  is  small;  as  SNRcor  descends  to  0,  so  do  the  correlation  coefficients.  This  mirrors  the 
relationship  expected  from  two  identical  STRFs  that  are  corrupted  by  independent  errors, 
as  indicated  by  the  solid-black  Curve  1.  In  other  words,  it  the  relationship  produced  when 
the  linear  spectrotemporal  processing  of  the  system,  summarized  by  the  (error-free)  STRF, 
is  impervious  to  changes  in  stimulus  type,  but  the  STRF  measurement  is  error-prone. 
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Figure  7:  Approximating  the  MTFst  with  the  SVD.  A,  B:  The  original  STRF  measurement  and  the 
corresponding  MTFst  magnitude  (first  two  quadrants).  C:  The  singular  values  (bars)  and  thresholds  (dashed 
lines)  of  the  first  two  quadrants  of  the  MTFst-  D,  E:  The  quadrant-separable  approximation  of  the  STRF 
and  the  remainder.  F,  G:  The  MTFst  magnitude  from  the  quadrant-separable  approximation,  and  the 
Fourier  Transform  of  the  remainder.  H,  I:  The  MTFst  magnitude  from  the  rank- 2  approximation  (from 
Figure  61),  and  the  Fourier  Transform  of  the  remainder  (from  6J) 


Since  the  SVD  approximations  act  to  reduce  errors,  they  should  result  in  higher  correla¬ 
tion  coefficients,  provided  the  STRF  measurements  have  similar  signal  components.  These 
properties  are  evident  in  the  three  dashed  curves  in  8A,  which  summarize  the  correlation 
coefficients  obtained  from  the  quadrant-separable  (Curve  2),  rank- 2  (Curve  3),  and  rank-1 
(Curve  4)  approximations  of  each  pair  of  measurements  (the  data  points  are  not  shown,  for 
clarity).  The  curves  fit  the  combined  data  from  all  three  types  of  stimulus  comparisons.  The 
fits  were  produced  by  modeling  the  error  reduction  as  a  multiplicative  gain  g  in  SNRcor 
(see  Methods).  The  values  of  g  used  for  Curves  2-4  are  1.7,  1.9,  and  2.9,  respectively;  these 
values  minimized  the  number  of  data  points  deviating  more  than  0.1  units  away  from  the 
curves  (providing  the  most  visually  pleasing  fits). 

For  all  data  points  exceeding  the  critical  SNRcor  =  1  level,  Figure  8B  shows  the  complete 
range  and  the  average  of  the  correlation  coefficients.  Again,  similar  results  are  obtained  no 
matter  which  two  stimulus  types  are  compared.  For  the  raw  measurements,  correlation 
coefficients  fall  between  0.5  and  0.8,  with  an  average  of  0.64.  The  average  rises  to  0.73 
and  0.75  for  the  quadrant-separable  and  rank- 2  approximations,  respectively.  For  the  rank- 
1  approximations,  the  correlation  is  0.85  on  average,  is  as  high  as  0.97,  and  does  not  fall 
below  0.74.  The  average  correlations  are  still  higher  (0.71,  0.78,  0.80,  and  0.88,  respectively) 
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Figure  8:  Similarity  between  STRFs  measured  with  different  stimulus  types:  Dynamic  ripple  (DR),  TORC 
(TC)  and  STWN  (WN).  DR-TC,  e.g.,  denotes  comparisons  between  the  dynamic-ripple  and  TORC  STRFs. 
Correlation  coefficients  were  computed  between  the  original  (raw)  measurements,  and  between  the  quadrant- 
separable  (q-sep),  rank-2  and  rank- 1  approximations  of  each  measurement.  A:  Correlation  coefficients  are 
plotted  versus  the  minimum  SNRcor  of  the  two  original  measurements.  The  squares,  triangles,  and  circles 
correspond  to  the  raw  comparisons;  different  symbols  correspond  to  the  different  pairs  of  stimulus  types 
compared  (see  legend).  Curve  1  (solid  black)  is  the  relationship  expected  from  two  identical  STRFs  with 
independent  errors.  Curves  2,  3,  and  4  (dashed  curves)  are  fits  to  the  correlation  coefficients  obtained  from 
the  quadrant-separable,  rank-2,  and  rank- 1  approximations  of  each  measurement,  respectively  (see  text). 
B:  The  complete  range  (vertical  lines)  and  the  average  (squares,  triangles,  and  circles)  of  the  correlation 
coefficients  are  shown  for  all  comparisons  where  the  minimum  SNRcor  was  above  1.  Also  shown  (black  x’s) 
are  the  average  correlation  coefficients,  for  all  pairs  of  stimulus  comparisons  combined,  obtained  when  the 
comparison  is  further  limited  to  the  half-sized  rectangular  region  of  the  STRF  containing  the  most  power 
(see,  e.g.,  the  dashed  box  on  the  top  left  STRF  of  Column  C).  Columns  C,  D,  E:  In  each  row,  the  STRF 
of  the  same  neuron  measured  with  different  stimulus  types  are  shown  side  by  side.  Shown  are  either  the 
rank- 1  or  rank-2  approximations  of  the  STRFs,  depending  on  what  was  optimal  for  the  measurement  with 
the  highest  SNRcor.  Listed  to  the  left  are  correlation  coefficients  obtained  from  each  pair  of  comparisons 
and  the  SNRcor  of  the  original  measurements. 


26 


when  the  comparisons  are  further  restricted  to  the  half-sized  rectangular  region  containing 
the  most  power  (e.g.,  the  dashed  box  in  the  top  row  of  8C),  as  indicated  by  the  x’s.  Least 
affected  are  the  rank- 1  comparisons,  suggesting  that  they  are  already  relatively  error  free. 
Note  that  these  values  far  surpass  those  typically  produced  by  comparing  the  STRFs  of 
different  neurons;  for  example,  if  the  rank- 1  approximation  of  a  neuron’s  STRFtorc  was 
compared  to  the  rank-1  approximation  of  the  subsequent  neuron’s  STRFstwn,  the  average 
correlation  was  0.03. 

Some  visual  comparisons  of  STRF  measurements  are  available  in  Columns  C  through  E 
of  Figure  8.  For  each  comparison,  either  the  rank- 1  or  rank-2  approximations  are  shown, 
depending  on  what  was  optimal  for  the  STRF  with  the  highest  SNRcor.  In  C  are  results 
from  three  neurons  that  were  tested  with  all  three  stimulus  types.  A  typical  rank- 1  result  is 
shown  in  the  top  row.  The  STRFs  match  in  many  details,  including  the  suppressive  areas 
and  the  multiple  excitatory  areas.  In  the  middle  row  is  a  rank- 2  example  with  somewhat 
lower-than-average  correlation  coefficients.  While  some  features  match  well  across  stimuli, 
there  is  an  increase  in  background  fluctuations  between  STRFdr  and  STRFstwn  that  limits 
the  comparisons.  The  rank- 1  approximations  may  have  been  more  appropriate  here  (and 
these  yielded  correlation  coefficients  over  0.8).  In  the  bottom  row  is  an  unusual  rank- 2 
example,  where  the  STRF  peak  shifts  to  a  higher  frequency,  thus  diminishing  the  correlation 
coefficients.  However,  SNRcor  of  the  STRFstwn  was  only  0.8,  so  it  is  difficult  to  make 
definite  claims  about  its  structure.  Results  from  additional  neurons  that  were  tested  with 
two  of  the  three  stimulus  types  are  provided  in  D  and  E.  Overall,  a  wide  variety  of  STRFs 
shapes,  including  unusual  “offset”  types  (E,  top  row),  are  well  preserved  across  stimulus 
type.  To  be  sure,  there  is  much  less  variation  in  STRF  shape  across  stimulus  type  than 
there  is  across  neurons. 

In  summary,  both  visual  and  quantitative  comparisons  reveal  a  close  resemblance  between 
the  STRFs  measured  with  different  stimulus  types.  The  resemblance  predictably  increases 
as  the  limiting  SNRcor  of  the  measurements  increases;  similarly,  using  the  SVD  to  reduce 
the  error  level  only  serves  to  increase  their  resemblance.  The  highest  correlation  coefficients 
result  from  the  rank- 1  approximations,  indicating  that  they  are  the  most  error-tolerant. 
Similar  results  are  obtained  no  matter  which  of  the  three  possible  pairs  of  stimulus  types 
are  compared.  By  the  same  token,  a  wide  variety  of  STRFs  are  observed  across  neurons. 
Together,  these  observations  indicate  that  linear  spectrotemporal  processing  is  a  robust 
property  of  AI  that  takes  diverse  forms  in  individual  neurons. 

3.3  The  Sources  and  Stimulus  Dependence  of  Measurement  Error 

In  Section  3.2,  it  was  shown  that  the  signal  component  of  the  STRF  measurement,  seen 
through  the  corrective  lens  of  the  SVD,  is  not  crucially  dependent  on  the  stimulus  type. 
Instead,  the  ability  of  the  SVD  to  separate  this  signal  from  the  measurement  errors  is  crucially 
dependent  on  SNRcor ,  which  may  depend  the  stimulus  type.  In  this  section,  we  examine 
the  sources  contributing  to  SNRcor  and  their  stimulus  dependence. 

3.3.1  Systematic  Error 

The  capacity  of  systematic  errors  to  limit  the  quality  of  the  measurements  is  evident  in 
the  relationship  between  SNR  and  SNRcor.  This  relationship,  observed  over  all  measure¬ 
ments  for  each  stimulus  type,  is  plotted  in  Figure  9  (with  second-degree  polynomial  fits 
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where  appropriate).  For  both  the  TORC  (A;  F,  Curve  1)  and  the  STWN  (D;  F,  Curve  4) 
measurements,  SNRcor  shows  a  clear  saturating  characteristic  as  SNR  increases.  Recall 
that  SNRcor  incorporates  both  the  non- systematic  and  systematic  errors,  while  the  SNR 
incorporates  only  the  non-systematic  errors.  Therefore,  as  the  measurements  become  more 
reliable  (SNR  increases),  the  saturation  of  SNRcor  evinces  the  systematic  error  that  domi¬ 
nates  when  the  non-systematic  errors  are  sufficiently  small.  The  relative  significance  of  the 
systematic  error  component  is  revealed  in  the  level  to  which  SNRcor  is  limited  in  the  high 
SNR  measurements. 
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Figure  9:  The  relationship  between  SNR  and  SNRcor  across  all  measurements  for  each  stimulus  type, 
and  second-degree  polynomial  fits  (black  curves)  when  appropriate.  The  level  of  saturation  of  these  curves 
indicates  the  relative  levels  of  systematic  errors  in  the  measurements.  A:  TORC.  B:  TORC  without  inverse- 
repeat,  thus  retaining  systematic  errors  due  to  even-order  nonlinearities.  C:  TORC  control,  discarding  half 
of  the  stimulus  presentations,  D:  STWN.  E:  Dynamic  ripple.  F:  Comparison  of  polynomial  fits:  Curves  1-4 
are  from  Figures  A-D.  Curve  5:  STWN,  discarding  half  of  the  stimuli,  thus  increasing  systematic  errors 
induced  by  the  stimulus  (e).  Curve  6:  STWN  control,  discarding  half  of  the  stimulus  presentations. 

Recall  that  for  the  TORC  measurements  (A;  F,  Curve  1),  the  inverse-repeat  method  was 
employed  in  order  to  remove  systematic  errors  due  to  even-order  nonlinearities.  Therefore, 
the  saturation  of  SNRcor  in  the  TORC  measurements  should  be  worsened  if  the  inverse- 
repeat  method  is  not  used.  Indeed,  bypassing  the  inverse- repeat  method  did  further  limit 
SNRcor  (B;  F,  Curve  2),  by  a  factor  of  about  2.5.  Note  that  this  is  not  simply  a  side  effect 
of  SNR  reductions  caused  by  discarding  half  of  the  data,  for  it  is  not  observed  if  half  of  the 
stimulus  presentations  are  discarded  but  inverse- repeat  is  still  employed  (C;  F,  Curve  3). 

In  the  STWN  measurements  (D;  F,  Curve  4),  the  systematic  errors  are  much  more  severe 
than  in  the  TORC  measurements;  the  limiting  value  of  SNRcor  is  at  least  4  times  lower, 
and  so  SNRcor  is  much  less  likely  to  exceed  usable  values.  SNRcor  is  also  less  variable 
across  the  high-SNR  measurements;  when  the  measurements  are  reliable,  which  is  fairly 
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often,  SNRcor  reliably  reaches  its  limited  potential.  This  potential  is  further  cut  in  half  by 
discarding  half  of  the  stimuli  (F,  Curve  5),  but  not  by  discarding  half  of  the  presentations  of 
each  stimulus  (Curve  6).  In  sum,  these  observations  suggest  that  the  errors  are  dominated 
by  the  nonideality  of  the  STWN  stimuli  (i.e.,  e),  to  which  all  neurons  were  exposed.  Our 
simulations  also  supported  this  view.  Therefore,  at  least  4  times  as  many  STWN  stimuli 
would  have  to  be  used  in  order  to  raise  the  SNRcor  potential  to  the  level  of  the  TORC 
method. 

Finally,  note  that  the  relationship  between  SNR  and  SNRcor  is  less  clearly  defined  in 
the  dynamic-ripple  measurements  (E)  (although  both  SNRcor  and  SNR  often  surpass  the 
values  achieved  by  the  other  two  stimulus  types).  In  our  experience,  this  is  largely  because 
the  errors  are  not  uniformly  distributed  over  the  dynamic- ripple  STRFs  (Depireux  et  al., 
2001),  due  to  the  outer- product  operation  in  the  construction  of  the  MTFst-  As  a  result, 
SNRcor  is  a  less  reliable  gauge  of  the  overall  error  level  in  the  dynamic-ripple  measurements. 

3.3.2  Non-Systematic  Error 

In  Section  3.3.1,  it  was  shown  how  the  potential  accuracy  of  the  STRF  measurements  is 
limited  by  the  level  of  systematic  error,  which  depended  on  the  stimulation  method.  However, 
if  a  method  is  to  achieve  a  given  level  of  accuracy  within  its  potential,  it  is  evident  in  Figure 
9  that  the  SNR  (which  reflects  the  level  of  non-systematic  error)  must  be  at  least  minimally 
adequate.  In  this  section,  we  explore  how  the  SNR  is  determined  from  the  interplay  between 
the  stimulus,  the  STRF,  and  the  neuronal  response. 

To  set  the  stage,  recall  from  Eq.  (5)  that  a  single  stimulus-response  pair  results  in 
the  measurement  of  a  set  of  one  or  more  points  on  MTFst [w,  f2],  which  is  given  by  the 
spectrotemporal  modulation  frequencies  content  of  the  stimulus.  By  Eq.  (6),  the  variance 
of  each  point  (w,Q)  is  a  fixed  proportion,  namely  1/a2,  of  the  variance  of  the  response’s 
Fourier  Transform  at  the  corresponding  (temporal)  frequency  w  (a2  is  the  power  of  each  of 
the  spectrotemporal  modulation  frequencies  in  the  stimulus).  Now,  consider  the  whole  of 
the  MTFst  measurement,  built  stimulus-by-stimulus  (depicted,  e.g.,  in  Figures  4E-J).  To 
simplify  matters,  we  will  first  consider  the  situation  in  which  every  point  of  the  measurement 
has  resulted  from  a  single  stimulus-response  pair  —  that  is,  prior  to  the  TORC  inverse-repeat 
procedure,  the  STWN  phase-averaging  procedure,  or  the  dynamic-ripple  outer-product  op¬ 
eration.  In  that  case,  to  find  the  variance  of  any  point  on  the  MTFst,  one  needs  only  to  find 
the  variance  of  the  appropriate  response  at  the  appropriate  frequency,  and  weight  it  by  1/a2. 
Consequently,  the  average  variance  of  the  entire  MTFst  (and  STRF)  measurement,  (a2), 
is  simply  1/a2  times  the  average  variance  at  all  of  the  relevant  frequencies  of  all  responses. 
The  SNR  is  then  the  ratio  of  P  (the  STRF  signal  power)  to  this  number. 

What  determines  the  variance  a  response’s  Fourier  Transform?  Two  observations  lead 
to  a  simple  answer.  First,  as  Figures  3D,  4D,  and  5C  typify,  the  variance  of  r[w]  is  nearly 
frequency- invariant.  Therefore,  the  average  variance  over  the  relevant  frequencies  is  closely 
related  to  the  average  variance  over  all  frequencies.  Now,  the  average  variance  over  all  fre¬ 
quencies  equals  the  average  variance  over  all  times  (Papoulis,  1962;  Oppenheim  and  Schafer, 
1989),  which  ties  in  the  second  observation:  The  variance  of  r[t]  is  proportional  to  r[t]/n 
(where  n  is  the  number  of  stimulus  periods).  This  originates  from  a  linear  relationship  be¬ 
tween  the  sample  mean  and  the  sample  variance  of  the  binned  spike  train  responses  (y[t]), 
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which  is  a  widely  reported  observation  (Shadlen  and  Newsome,  1998).  Consequently,  the 
average  variance  over  time  is  proportional  to  the  average  spike  rate  over  time,  f.  So  finally, 
all  else  being  equal  across  stimuli,  f  (over  all  responses)  can  be  treated  as  the  lone  variable 
determining  the  average  variance  of  the  responses  over  the  relevant  frequencies.  The  rela¬ 
tionship  observed  across  all  STRF  measurements  is  shown  in  Figure  10A,  where  the  variance 
has  been  transformed  into  the  variance  of  a  single  response  period  by  multiplying  by  n  (thus 
correcting  for  differences  in  n  across  measurements).  The  trend  across  all  neurons  is  indeed 
linear  (on  this  log-log  plot,  the  slopes  of  the  linear  fits  to  the  data  were  very  close  to  1),  and 
is  only  weakly  influenced  by  stimulus  type. 

In  contrast,  the  choice  of  stimulus  type  effects  order-of-magnitude  differences  in  a 2  (due 
to  differences  in  the  number  of  spectrotemporal  modulation  frequencies  per  stimulus;  recall 
Figure  2).  This  in  turn  strongly  effects  the  STRF  variance  (a2)  for  a  given  average  spike 
rate  f.  Given  the  relationship  observed  between  r  and  average  response  variance  in  10A,  the 
predicted  relationship  between  r  and  (a2)  (again  scaled  by  n)  for  each  of  stimulus  type  is 
indicated  by  the  dashed  lines  in  B.  Note,  however,  that  for  a  given  neuron,  the  actual  effect 
of  stimulus  type  on  (a2)  depends  on  how  f  is  also  affected.  Curiously,  we  have  seen  little 
evidence  for  a  significant  effect  of  stimulus  type  on  f.  From  one  type  to  the  next,  up  to 
factor-of-two  increases  or  reductions  in  r  were  typical,  but  this  variation  is  not  systematic 
and  is  small  compared  that  of  a2. 

The  actual  relationship  between  the  average  spike  rate  and  the  STRF  variance  observed 
across  all  STRF  measurements  is  indicated  by  the  data  points  plotted  in  B.  The  discrepancies 
between  these  trends  and  the  dashed  lines,  where  they  exist,  are  easily  explained  by  the  fact 
that  every  point  of  the  actual  MTFst  measurements  is  not  the  result  of  just  one  stimulus- 
response  pair,  as  we  have  so  far  assumed.  For  the  STWN  stimuli,  MTFst [w,  0]  was  the 
average  result  from  30  stimulus-response  pairs;  therefore,  its  actual  variance  (a2)  (black 
diamonds)  was  lower  than  the  black  (upper-most)  dashed  line  by  a  factor  of  30.  This  largely 
compensated  for  the  difference  in  a 2  between  the  STWN  and  TORC  stimuli.  Similarly,  the 
inverse-repeat  method  effectively  averages  the  results  from  two  sets  of  stimuli,  and  so  the  (a2) 
of  the  final  TORC  result  (red  circles),  was  cut  in  half  with  respect  to  the  red  (middle)  dashed 
line.  Finally,  we  observed  that  the  (a2)  of  the  final  dynamic-ripple  MTFst  (blue  dots),  each 
point  of  which  results  from  the  normalized  product  of  two  individual  measurements,  was 
typically  similar  to  that  of  the  measured  cross-sections  alone.  Therefore,  its  relation  to  f 
was  similar  to  the  black  (lower-most)  dashed  line,  albeit  with  quite  a  bit  of  scatter.  Overall, 
these  properties  conspired  to  produce  SNR's  that  were,  on  average,  a  factor  of  5  lower  in  the 
TORC  measurements  than  in  the  dynamic-ripple  measurements,  and  an  additional  factor  of 
2  lower  in  the  STWN  measurements. 

For  each  stimulus  type,  the  average  spike  rate  f  observed  across  neurons  ranged  over 
roughly  two  orders  of  magnitude.  Figure  IOC  shows  that  the  value  of  f  is  partially  predicted 
by  the  STRF  power  P,  in  that  r,  and  more  strictly  its  lower  bound,  tends  to  grow  by  the 
square-root  of  P  (the  black  line  on  this  log- log  plot  has  a  slope  of  1/2).  A  square-root  re¬ 
lationship  is  expected  from  the  linear  response  model  followed  by  rectification:  Generally 
speaking,  STRFs  (and  MTFst ’s)  with  higher  magnitudes  result  in  spike  rates  with  pro¬ 
portionally  stronger  modulations,  which,  since  the  spike  rate  must  be  positive,  result  in 
proportionally  higher  r’s;  meanwhile,  P  grows  as  the  square  of  the  STRF  magnitudes.  Since 
f  translates  linearly  into  variance,  this  implies  that  STRFs  with  higher  average  power  P, 
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Figure  10:  The  sources  and  stimulus  dependence  of 
SNR ,  for  the  dynamic  ripple  stimuli  (blue  dots),  TORCs 
(red  circles),  and  STWN  (black  diamonds).  A:  The  lin¬ 
ear  relationship  between  the  average  spike  rate  r  and  the 
average  variance  of  the  response’s  Fourier  Transform,  f 
is  averaged  over  all  responses.  The  variance  is  averaged 
over  all  responses,  but  only  those  temporal  frequencies  of 
each  response  relevant  to  the  MTF st  measurement  (where 
the  corresponding  MSst  magnitude  was  nonzero).  The 
variance  is  scaled  by  n  (the  number  of  stimulus  periods) 
to  correct  for  differences  n  across  the  measurements,  and 
thus  represents  the  variance  of  a  single  response  period.  B: 
(dashed  lines)  The  expected  relationships  between  f  and 
(cr2)  (scaled  by  n)  in  the  case  where  each  point  of  the 
MTFst’s  is  obtained  from  a  single  stimulus-response  pair. 
The  actual  relationships  observed  (plotted  points)  differ 
from  the  dashed  lines  by  an  amount  predicted  by  the  num¬ 
ber  of  stimulus-response  pairs  whose  results  are  averaged  to 
obtain  the  final  MTFst  (see  text).  C:  The  lower  bound  of  f 
is  proportional  to  the  square-root  of  the  STRF  signal  power 
P  (the  diagonal  line’s  slope  is  1/2).  The  square-root  law 
is  expected  from  a  linear-plus-rectification  response  model, 
but  the  scatter  in  f  suggests  additional  sources  of  variabil¬ 
ity. 


although  associated  with  higher  absolute  levels  of  variability,  have  the  potential  to  achieve 
higher  SNRs;  and  this  potential  is  realized  in  those  neurons  with  the  lowest  r  allowed  for  a 
given  P.  Note  that  the  data  from  all  stimulus  types  overlap,  reinforcing  the  idea  that  r  is 
not  significantly  affected  by  stimulus  type. 
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In  summary,  the  ingredients  of  SNR  are  of  two  largely  independent  varieties:  properties 
of  the  stimulus  and  properties  of  the  auditory  system.  The  key  stimulus  properties  boil 
down  to  the  power  in  each  spectrotemporal  modulation  frequency  a2,  to  which  the  SNR 
is  inversely  proportional,  and  the  number  of  stimulus-response  pairs  used  to  measure  each 
point  of  the  MTFst  (including  n,  the  number  of  periods  of  each  stimulus),  to  which  SNR  is 
proportional.  The  system  properties  reduce  to  the  STRF  power  P  and  the  average  spike  rate 
r,  to  which  the  SNR  is  proportional  and  inversely  proportional,  respectively.  Furthermore, 
f  can  be  seen  as  the  sum  of  two  positive- valued  components.  One  is  proportional  to  the 
square-root  of  P,  as  predicted  by  a  linear-plus-rectification  response  model.  The  other  not 
obviously  related  to  the  STRF,  and  represents  an  additional  source  of  variability  that  varies 
in  strength  from  neuron  to  neuron.  The  net  result  is  that  an  increase  in  P  serves  to  increase 
the  SNR,  while,  for  a  given  P,  an  increase  f  counteracts  this  effect. 

3.4  Sufficiency  and  Error  Dependence  of  the  SVD-Based  Approx¬ 
imations 

In  Section  3.2,  the  SVD  approximations  of  STRFs  measured  with  different  stimulus  types 
were  found  to  be  highly  similar  when  SNRcor  (which  reflects  the  level  of  measurement  error) 
was  adequate  in  both  measurements.  The  stimulus  dependence  of  SNRcor  was  then  analyzed 
in  detail  in  Section  3.3.  In  this  section,  we  further  examine  how  the  SVD  approximations  are 
influenced  by  SNRcor.  Primarily,  we  are  concerned  with  the  extent  to  which  measurement 
errors  may  prevent  the  SVD  from  resolving  features  of  the  “true”  (error-free)  STRF. 

For  this  purpose,  it  would  be  useful  to  know  the  proportion  of  the  true  STRF’s  power 
lost  from  an  SVD  approximation  of  the  measurement.  Unfortunately,  in  the  presence  of 
measurement  error,  this  quantity  is  not  precisely  knowable.  One  way  to  estimate  it  is  to 
compute  the  proportion  of  the  STRF  measurement’s  power  lost  from  an  SVD  approximation, 
which  we  call  qlsvd  (Depireux  et  al.,  2001).  In  total,  we  will  consider  a^svm  asvDi  and  a\ svun 
which  speak  to  the  sufficiency  of  the  rank- 1,  rank-2,  and  quadrant-separable  approximations, 
respectively.  One  obvious  disadvantage  of  asvD  is  that  it  is  inflated  in  the  presence  of 
measurement  errors  (which  comprise  much  of  the  measurement’s  lost  power).  This  is  evident 
in  Figures  11A  through  C,  where  o4yD  (A),  OisvD  (B),  and  ctgvQ  (C)  are  plotted  versus 
SNRcor  for  all  TORC  and  STWN  STRFs  (recall  that  SNRcor  is  unreliable  for  the  dynamic- 
ripple  STRFs).  The  influence  of  SNRcor  on  asv n  clearly  persists  up  to  high  SNRcoft s. 

We  reduced  the  dependence  of  a  svd  on  the  error  level  by  removing  the  effect  of  the  non- 
systematic  errors  (see  Methods).  The  improved  measure,  ft  svd  is  a  more  accurate  gauge  of 
the  proportion  of  lost  STRF  power,  especially  when  the  systematic  errors  are  small  (e.g.,  in 
the  TORC  measurements).  In  theory,  ft  svd  should  be  more  tolerant  than  a  svd  to  changes 
in  SNR,  and  a  svd  should  converge  down  to  ftsv n  with  increasing  SNR.  These  properties 
are  verified  in  Figures  11D  through  F,  where  ftsv n  (red  circles)  and  a  svd  (back  dots)  are 
plotted  versus  SNR  for  the  TORC  measurements  (the  only  caveat  is  that  at  very  low  SNRs, 
ft svd  becomes  unstable).  It  is  concluded  (with  additional  support  from  our  simulations) 
that  at  moderate  to  high  SNRs,  the  effect  of  non-systematic  error  is  accurately  removed 
in  the  computation  of  ft  svd-  Therefore,  ft  svd  estimates  the  proportion  of  the  systematic 
part  of  the  STRF  measurement  relegated  to  the  SVD  remainder,  and  better  reflects  the  true 
STRF’s  structure.  To  be  conservative,  we  will  consider  ft  svd  only  in  those  measurements 
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Figure  11:  Sufficiency  of  the  SVD  approxi¬ 
mations  as  a  function  of  the  error  level.  A-C: 
asvD,  the  proportion  of  the  STRF  measure¬ 
ment’s  power  lost  from  the  SVD  approxima¬ 
tions,  for  all  TORC  (black  o’s)  and  STWN 
(red  x’s)  measurements.  D-F:  /3svd  (red 
o’s)  and  asvD  (back  dots)  versus  SNR  for 
all  TORC  measurements.  ftsvD  estimates 
the  proportion  of  the  systematic  part  of  the 
STRF  measurement  relegated  to  the  SVD 
remainder,  and  therefore  better  reflects  the 
true  STRF’s  structure.  At  very  low  SNRs, 
Psvd  is  unstable  (some  points  lay  beyond 
the  axis  limits).  G-I:  /3svd  versus  SNRcor 
for  all  TORC  measurements  with  SNR  above 
1.5.  Black  +’s  and  red  x’s  denote  those  mea¬ 
surements  optimally  approximated  by  rank-1 
(separable)  and  rank-2  (non-separable)  matri¬ 
ces,  respectively.  With  f3gyD  (G)  typically  as 
high  as  25%,  many  STRFs  are  not  well  de¬ 
scribed  by  the  rank- 1  approximations.  In  con¬ 
trast,  (H)  and  (I)  are  typically 

well  below  10%,  indicating  that  all  STRFs 
are  well  described  by  both  the  rank-2  and 
quadrant-separable  approximations.  The  un¬ 
usually  high  Psvd’s  at  the  lowest  SNRcor’s 
indicates  that  the  SVD  is  unable  to  resolve 
the  structure  of  some  non-separable  STRFs 
with  high  error  levels.  J-L:  asvD ,  computed 
as  olsvd  but  from  the  quadrant-separable  (J, 
K)  and  the  rank-2  (L)  approximations  of  the 
TORC  (black  o’s)  and  STWN  (red  x’s)  mea¬ 
surements.  M-O:  As  expected,  the  asvD1 s 
are  well  matched  to  the  corresponding  to 
Psvd1^  in  those  TORC  measurements  with 
SNRcor  above  2. 


with  SNR1  s  over  1.5. 

The  relationship  between  Psvd  and  SNRcor  for  the  82  TORC  measurements  meeting 
this  criterion  is  plotted  in  Figures  11G  through  I.  The  blue  +’s  and  red  x’s  denote  the  50 
and  31  measurements  optimally  approximated  by  rank- 1  and  rank-2  matrices,  respectively 
(the  lone  rank-3  approximation  is  not  shown).  At  moderate  to  high  SNRcoS s  (e.g.,  above 
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2),  the  /3svd  distributions  are  only  weakly  dependent  on  SNRcor.  In  other  words,  the 
SVD  approximations  are  only  weakly  affected  by  measurement  errors,  and  therefore  (3svd 
should  more  accurately  reflect  the  structure  of  the  true  STRF.  Therefore,  the  typical  range 
of  Psvd  (HG),  roughly  from  3%  to  25%,  indicates  many  STRFs  are  poorly  described  by 
rank-1  approximations.  It  is  reassuring  that  the  lower  and  upper  portions  of  this  range  are 
dominated  by  the  measurements  optimally  approximated  by  rank- 1  and  rank-2  matrices, 
respectively  However,  the  boundary  between  the  two  populations  progressively  shifts  from 
about  5%  at  the  highest  SNRcor  to  nearly  15%  at  the  lowest  SNRcor.  This  reflects  the 
fact  that  the  optimal  trade-off  between  error  reduction  and  signal  loss  afforded  by  the  SVD 
approximations  gets  worse  as  SNRcor  decreases;  at  higher  error  levels,  the  true  STRF  must  be 
further  from  being  rank-1  before  the  second  separable  matrix  of  the  SVD  becomes  dominantly 
signal  and  is  included  in  the  approximation. 

Over  this  same  range  of  suitably  high  SNRcor's,  f$$yD  (H)  and  (I)  are  universally 

bound  below  10%,  with  averages  of  3.1%  and  3.6%,  respectively.  That  is,  the  true  STRFs  are 
almost  completely  contained  within  both  the  rank-2  and  quadrant-separable  approximations 
of  TORC  measurements  with  suitably  low  error  levels.  Indeed,  as  was  illustrated  in  Section 
3.1.4,  the  two  approximations  were  usually  very  similar. 

When  SNRcor  is  low,  a  handful  of  measurements  have  conspicuously  high  values  of  f3gyD 
(G),  fi*gyD  (H),  or  (I).  There  are  three  plausible  reasons  for  this:  (1)  The  systematic 

errors  in  these  measurements  are  unusually  large  (thus  inflating  / 3svd );  (2)  The  true  STRFs 
are  actually  poorly  described  by  these  SVD  approximations,  and  coincidentally  the  measure¬ 
ments  have  a  high  error  level;  (3)  Because  of  the  high  error  level,  the  SVD  of  these  STRFs 
shapes  is  being  disrupted,  and  more  STRF  power  is  being  lost  than  otherwise  would  be.  We 
favor  the  last  reason,  since  (despite  the  error  level)  most  of  these  STRFs  appear  to  have 
non- separable  shapes.  Such  STRFs  are  are  also  found  at  higher  SNRcor'  s,  but  these  high 
values  of  fSfyD  and  are  not  found  at  higher  SNRcor' s. 

Although  they  are  needed  to  fully  describe  many  STRFs,  the  trade-off  to  using  the  rank- 
2  or  quadrant-separable  approximations  instead  of  the  rank-1  approximations  is  that  they 
retain  a  higher  proportion  of  the  measurement  error.  This  was  earlier  indicated  in  Figures 
8A  and  B.  Similarly,  for  the  these  TORC  measurements,  we  estimated  (using  the  bootstrap 
method)  that  the  SNR  of  the  rank- 1  approximation  is  on  average  3.4  ±0.6  times  higher  than 
that  of  the  raw  measurement,  while  for  the  rank- 2  and  quadrant-separable  approximations, 
the  average  gain  in  SNR  is  reduced  to  2.0  ±  0.6  and  1.9  ±  0.6,  respectively.  Note  that 
these  values  are  comparable  to  the  SNRcor  gain  values  g  employed  in  Section  3.2.  Although 
the  rank-1  approximations  have  higher  SNRs,  which  means  that  they  remove  proportionally 
more  noise  than  signal  from  the  measurements,  the  proportion  of  signal  removed  (as  gauged 
by  Psvd)  is  unacceptably  high  for  many  STRFs. 

In  order  to  cross-check  the  results  obtained  from  (3svd ,  we  recomputed  asw  from  the 
SVD  approximations  (denoted  by  asw),  rather  than  from  the  raw  measurements.  For 
example,  if  the  quadrant-separable  approximation  is  indeed  a  complete  and  relatively  error- 
free  version  of  the  true  STRF,  then  computing  a^yD  and  OgyD  from  it  should  yield  results 
close  to  the  corresponding  d^vn  and  figy n  (from  the  raw  STRF  measurement).  Similarly, 
computing  a'sy^  from  the  the  rank-2  approximation  should  yield  a  result  close  to  ^ . 
These  &svd ’s  are  plotted  in  Figures  11 J  through  L  versus  SNRcor  for  both  the  TORC 
and  STWN  measurements.  With  respect  to  the  original  asvD  s  in  11A  through  C,  they 


34 


are  more  tolerant  to  changes  in  SNRcor  over  a  wider  range  of  SNRcor's.  When  SNRcor  is 
above  2,  these  &svd  s  are  indeed  closely  matched  to  the  corresponding  (3svd  s,  as  Figures 
11M  through  O  attest.  When  SNRcor  drops  below  1,  the  olsvd  s  rapidly  increase  and 
lose  their  correspondence  with  /3svd ,  presumably  because  the  assumption  that  the  SVD 
approximations  are  complete  and  error-free  rapidly  breaks  down. 

In  this  section,  we  have  concentrated  on  the  TORC  measurements.  They  are  ideal  in 
that  they  produced  low  levels  of  systematic  error  and  a  wide  range  of  SNRcor’ s.  The  STWN 
measurements  were  less  than  ideal  in  that  SNRcor  was  limited  below  2.  In  Section  3.3.1, 
this  was  found  to  be  chiefly  due  to  high  levels  of  stimulus-induced  systematic  error;  indeed 
ft svo  was  grossly  inflated  in  these  measurements,  rendering  it  no  more  illuminating  than 
asvv  (not  shown).  Nevertheless,  over  the  range  of  SNRcor  that  they  can  be  compared, 
the  distributions  of  asw  in  Figures  11A  through  C  and  osvd  in  J  through  L  were  very 
similar  for  the  STWN  and  TORC  measurements.  Moreover,  from  Section  3.2,  the  SVD 
approximations  of  STWN  and  TORC  measurements  were  increasingly  well  matched  as  the 
error  level  dropped.  Therefore,  the  available  evidence  supports  the  hypothesis  that,  for 
a  given  level  of  measurement  error,  the  STWN  results  and  TORC  results  are  equivalent, 
but  the  STWN  results  are  much  more  error  prone.  The  dynamic-ripple  results  were  less 
than  ideal  in  that  STRFdr  is  quadrant-separable  by  construction.  Additionally,  it  contains 
non-uniformly  distributed  errors  (Depireux  et  ah,  2001),  which  complicates  both  the  SVD 
(Stewart,  1993)  and  the  interpretation  of  SNRcor.  With  this  caveat,  we  note  that  the 
distribution  (although  not  the  range)  of  ftsvD  was  skewed  toward  somewhat  higher  values  in 
the  dynamic-ripple  measurements.  For  instance,  /3gvD  exceeded  10%  in  61%  of  STRFdr’s 
versus  45%  of  STRFtorc’s.  Still,  ftfyD  was  below  5%  in  91%  of  STRFdr’s;  the  indications 
were  that  most  STRFdr’s  were  still  well  described  by  rank- 2  approximations. 

In  summary,  the  optimal  SVD  approximation  of  an  STRF  measurement  with  a  sufficiently 
low  error  level  (e.g.,  SNRcor  above  2)  does  well  describe  the  STRF,  in  that  it  preserves  at 
least  90%  of  the  STRF’s  power.  Therefore,  we  can  be  confident  that  if  the  SVD  approx¬ 
imations  of  two  STRF  measurements  are  well  matched,  so  are  the  corresponding  STRFs. 
However,  when  there  exist  higher  levels  of  measurement  error,  this  is  no  longer  guaranteed 
to  be  the  case,  particularly  for  STRFs  that  contain  a  significant  non-separable  component. 
Overall,  around  60%  of  the  TORC  measurements  were  well  described  as  being  separable. 
The  rest  were  better  served  by  both  rank- 2  and  quadrant-separable  approximations,  which 
were  essentially  identical.  To  the  extent  that  they  could  be  compared,  the  STWN  and 
dynamic- ripple  measurements  produced  similar  results. 

4  Discussion 

The  STRF  defines  the  space  of  spectrotemporal  patterns  that  exert  a  linear  influence  on  a 
neuron’s  firing  rate.  A  random  exploration  of  this  space,  fostered  by  the  traditional  method¬ 
ology  of  reverse  correlation,  has  been  the  basis  of  most  previous  STRF  measurements.  In¬ 
stead,  we  applied  a  deterministic  and  analytical  reformulation  of  reverse  correlation,  which 
is  based  upon  the  Fourier-series  description  of  dynamic  spectra.  One  advantage  of  this  ap¬ 
proach  concerns  experimental  optimization:  It  enables  us  to  restrict  the  stimulus  space  to 
a  minimal,  discrete  set  of  spectrotemporal  patterns  (the  spectrotemporal  modulation  fre¬ 
quencies,  presented  simultaneously  or  individually).  It  also  facilitates  our  understanding  of 
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measurement  errors  and  their  various  stimulus-  and  response-induced  components.  In  sum, 
it  enables  us  to  design  stimuli  that  are  optimally  efficient  and  effective,  while  taking  into 
account  general  knowledge  of  the  STRF  structure,  response  nonlinearity  and  variability,  and 
specific  laboratory  constraints.  A  second  advantage  concerns  experimental  evaluation:  Since 
any  given  dynamic  spectrum  can  be  described  by  its  Fourier  series,  we  can  understand  and 
quantify  the  performance  of  different  stimulation  methods,  even  if  they  were  devised  within 
different  frameworks.  Both  of  these  advantages  have  been  demonstrated  in  this  study,  where 
we  have  measured  STRFs  of  AI  neurons  with  three  very  different  types  of  stimuli. 

We  now  discuss  the  major  empirical  results  of  this  study. 

4.1  Linearity 

The  most  striking  finding  is  that  when  the  STRF  of  an  AI  neuron  is  successfully  measured 
with  different  types  of  stimuli,  the  results  are  very  similar.  The  STRFs  themselves  exhibit 
a  high  degree  of  richness  and  diversity  across  neurons.  The  three  types  of  stimuli  used, 
Dynamic  Ripples,  TORCs,  and  STWN  differ  greatly  in  their  spectrotemporal  characteristics 
and  statistics  (c.f.  Figures  2,  3A,  4A,  and  5A),  and  indeed  they  sound  quite  distinct  from  one 
another.  Great  differences  even  exist  between  stimuli  of  a  given  type  (except  for  STWNs, 
which  all  sound  noise- like).  STRFs  measured  from  such  widely  different  stimuli  cannot  be 
the  same  unless  these  neurons’  responses  are  strongly  linear  with  respect  to  the  dynamic 
spectra  of  stimuli.  Strong  nonlinearities  would  not  allow  the  STRFs  generated  from  such 
different  stimuli  to  have  such  large  correlation  coefficients  (except  trivial  cases  such  as  static 
nonlinearities,  e.g.,  rectification).  The  correlation  coefficients  are  especially  large  considering 
that  they  were  computed  over  regions  of  the  measurements  containing  little  STRF  power 
but  still  containing  errors  (even  after  the  SVD),  and  furthermore  the  total  duration  of  all 
stimulus  presentations  often  exceeded  an  hour. 

4.2  Efficacy  of  the  stimuli 

Although,  when  successful,  they  lead  to  very  similar  STRF  measurements,  the  three  types 
of  stimuli  differ  in  their  rates  of  success.  Success  is  achieved  when  the  STRF  measurement 
contains  sufficiently  low  levels  of  both  non- systematic  and  systematic  errors,  reflected  by  the 
measures  of  SNR  (using  only  non-systematic  error)  and  SNRcor  (including  systematic  error). 
Non- systematic  errors,  caused  by  response  variability,  are  reduced  when  the  modulations 
in  the  stimulus  are  more  powerful  (evoking  stronger  modulations  in  the  response  relative 
to  the  average  spike  rate),  and  also  by  averaging  the  results  from  stimuli  with  identical 
spectrotemporal  statistics.  Systematic  errors,  caused  when  multiple  stimulus  components 
evoke  interfering  response  components  (either  linearly  or  nonlinear ly) ,  are  reduced  by  careful 
stimulus  design,  or  by  averaging  the  results  from  stimuli  with  identical  spectrotemporal 
statistics  (but  different  individual  characteristics).  Note  that  all  of  the  stimulus  types  used 
had  approximately  the  same  total  presentation  duration. 

On  balance,  the  stimuli  that  gave  the  best  results  were  TORCs,  which  benefitted  from 
careful  stimulus  design  and  relatively  strong  responses.  As  a  result,  we  have  noted  that  usable 
STRF  measurements  could  have  been  obtained  after  presenting  one  sweep  of  each  TORC 
stimulus  (taking  about  3  minutes),  a  fact  that  we  intend  to  exploit  in  the  future.  STWN, 
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while  strongly  motivated  by  the  traditional  reverse  correlation  methodology,  gave  STRFs 
with  substantially  more  systematic  error  than  TORCs.  While  both  stimuli  are  capable  of 
giving  STRFs  with  high  SNR,  the  STWN  results  in  substantially  poorer  SNRcor.  This  is 
most  cleanly  seen  by  comparing  figure  panels  9 A  and  9D:  both  stimulus  types  give  STRFs 
with  SNR  as  high  as  30,  but  STWN  generated  STRFs  have  SNRcor  that  saturate  below  2, 
while  TORC  generated  STRFs  have  SNRcor  saturating  at  substantially  higher  values. 

Although  the  dynamic- ripple  stimuli  produce  the  most  reliable  results  (highest  SNR), 
they  suffered  a  fundamental  flaw:  Too  many  stimuli  were  required  to  measure  the  full  MTFst 
(and  hence  its  STRF),  and  so  measurements  were  restricted  a  subset  of  stimuli  required  if 
the  MTFst  is  quadrant-separable.  This  is  problematic  for  two  main  reasons.  First,  it  makes 
it  impossible  to  assess  the  quadrant-separability  assumption  directly.  Although  quadrant- 
separability  holds  in  (ketamine-anesthetized)  AI,  there  may  be  other  neuronal  populations 
or  experimental  conditions  for  which  it  doesn’t.  Second,  the  full  MTFst  measurement  is  a 
more  complex  (nonlinear)  function  of  the  individual  stimulus-response  relationships.  This 
complicates  the  evaluation  of  measurement  errors,  and  thus  blurs  the  distinction  between 
neural  functionality  and  methodological  artifact.  Indeed,  the  dynamic-ripple  results  had  a 
few  subtle  idiosyncrasies,  including  more  non-separable  STRFs,  and  SNR^Ss  poorly  corre¬ 
lated  with  other  assays  of  measurement  errors  (SNR)  and  STRF  structure  ( asvD ,  @svd)- 
However,  since  the  measurements  are  so  reliable,  it  may  be  feasible  to  sacrifice  some  SNR  by 
reducing  the  number  stimulus  repetitions  in  order  to  present  all  stimuli  required  to  directly 
measure  the  full  MTFst  (Versnel  et  ah,  2002). 

Finally,  we  note  that  the  TORC  approach  is  not  limited  to  the  particular  stimuli  used 
in  this  study.  Any  combination  of  spectrotemporal  modulation  frequencies  could  exist  in 
each  stimulus,  provided  that  they  are  temporally  orthogonal.  Therefore  one  can  produce 
“super”  TORCs,  using  fewer  (but  longer-duration)  stimuli,  each  of  which  contains  many 
spectrotemporal  modulation  frequencies  (Klein  et  al.,  2000).  These  stimuli  are  more  noise 
like,  but  benefit  from  a  lack  of  stimulus-induced  systematic  measurement  errors  in  contrast 
to  the  STWN  stimuli.  We  are  currently  investigating  the  effectiveness  of  such  stimuli. 

4.3  The  SVD:  error  reduction  and  signal  loss 

In  this  paper,  we  used  the  SVD  to  reduce  errors  in  the  STRF  measurements.  The  SVD 
is  ideally  suited  for  use  with  the  STRFs  measured  here,  because  their  SVD  is  strongly 
dominated  by  the  lowest  order  terms;  that  is,  they  are  well  approximated  by  a  small  number 
of  fully  separable  (rank- 1)  matrices.  When  such  STRFs  are  perturbed  by  unstructured  errors, 
the  SVD  is  still  strongly  dominated  by  the  lowest  order  terms,  and  has  a  well- understood 
contribution  from  higher  order  terms.  The  boundary  between  the  low  order  (high  signal, 
low  error)  and  high  order  (low  signal,  high  error)  terms  is  not  known  a  priori ,  but  is  well 
understood  from  signal  detection  theory.  The  upshot  is  that  truncating  the  SVD  series  of 
an  STRF  at  low  order  is  an  efficient  and  well- understood  way  of  increasing  SNR  while 
minimizing  loss  of  signal. 

Of  the  STRF  measurements  that  were  suitably  error-free,  more  than  half  were  not  only 
optimally  approximated  but  well  approximated  (as  reflected  by  /3svd)  by  fully  separable 
(rank- 1)  matrices.  These  approximations  reduced  the  error  power  by  at  least  a  factor  of 
3  while  sacrificing  less  than  a  tenth  of  the  signal  power.  The  rest  of  the  STRFs  required 
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two  SVD  terms  ( rank-2  approximations);  using  only  one  SVD  term  (rank- 1)  would  give  an 
incomplete  view  of  the  system  functionality  due  to  excessive  signal  loss.  The  rank- 2  approx¬ 
imations  have  somewhat  diminished  error  reduction,  down  to  a  factor  of  2.  Alternatively, 
the  quadrants  of  the  MTFst  could  be  approximated  by  fully  separable  matrices,  producing 
results  very  similar  to  the  rank- 2  approximations.  However,  if  the  error  level  was  too  high 
(e.g.,  SNRcor  below  1),  the  optimal  SVD  approximations  no  longer  reliably  achieved  both 
significant  error  reduction  and  adequate  signal  retention.  The  error  level  should  always  be 
considered  when  interpreting  the  results  of  the  SVD. 

4.4  The  SVD:  functional  implications 

It  is  intriguing  that  STRFs  are  equally  well  described  by  rank- 2  and  quadrant-separable 
approximations  (see  Figures  11H  and  I).  These  properties,  each  special  in  their  own  right, 
do  not  necessarily  imply  one  another.  It  turns  out  that  if  an  STRF  is  both  rank- 2  and 
quadrant-separable,  special  phase  relationships  must  exist,  in  either  the  temporal  or  spec¬ 
tral  dimensions  (or  both),  between  the  separable  matrices  of  the  SVD  or  equivalently  the 
quadrants  of  the  MTFst-  It  has  been  demonstrated  (Simon  et  al.,  subm)  that  AI  STRFs 
possess  this  property  in  the  temporal  dimension  (but  not  necessarily  the  spectral).  This 
itself  has  strong  theoretical  implications  for  the  network  connectivity  of  those  neurons. 

4.5  The  error  measures 

We  found  that  incorporating  systematic  errors  (otherwise  known  as  bias )  into  our  consider¬ 
ation  of  the  total  measurement  error  level  is  absolutely  crucial  for  aligning  the  results  from 
different  types  of  stimuli,  and  thus  understanding  the  structure  of  an  STRF  measurement 
(and  the  resulting  SVD  approximations,  correlation  coefficients,  etc.)  independent  of  stim¬ 
ulus  type.  We  used  and  analyzed  two  different  measures  of  error:  SNR  and  SNRcor.  SNR 
is  the  more  classical  but  more  limited  of  the  two;  SNR  is  the  ratio  of  the  measured  STRF 
power  to  the  measured  STRF  variance  (square  of  the  standard  error).  This  definition  of  SNR 
(and  its  associated  measure  of  error)  is  not  able  to  incorporate  systematic  error,  however.  In 
contrast,  SNRcor  does  incorporate  systematic  error.  SNRcor  is  the  ratio  of  measured  STRF 
power  to  measured  non-STRF  power  (e.g.,  the  power  in  the  spectrotemporal  region  where 
the  underlying  STRF  is  expected  to  have  near-zero  power). 

The  only  problem  with  SNRcor  is  that  it  requires  assumptions  about  the  structure  of 
the  errors  and  the  STRF,  which  may  not  apply  to  all  STRFs  and  stimuli.  In  particular, 
we  assumed  (based  primarily  on  observations)  that  errors  are  evenly  distributed  over  the 
measurements,  and  that  the  STRF  power  is  near  zero  for  r  above  125  ms  (using  negative  r’s 
is  no  different  since  the  stimuli  were  periodic).  The  usefulness  and  predictability  of  SNRcor 
demonstrated  that  these  assumptions  largely  held  for  the  TORC  and  STWN  measurements. 
This  was  not  the  case  for  the  dynamic-ripple  measurements,  however,  likely  due  to  a  combi¬ 
nation  of  response  nonlinearity  and  the  nonlinearity  of  the  STRF  measurement  itself,  which 
distributes  the  errors  non-uniformly  in  the  spectrotemporal  (and  modulation  frequency)  do¬ 
main.  It  will  be  even  more  useful  in  the  future  to  devise  measures  of  the  systematic  errors 
that  are  less  dependent  on  the  structure  of  the  STRF  measurement. 


4.6  Response  variability 

In  our  investigation  of  non-systematic  errors  in  the  STRF  measurements,  several  observa¬ 
tions  concerning  the  variability  of  AI  responses  have  interesting  functional  implications.  For 
example,  the  fact  that  the  response  variance  could  be  linearly  predicted  from  the  average 
spike  rate  in  a  nearly  stimulus-independent  manner  points  to  a  Poisson-like  spike  generation 
mechanism,  which  has  been  vigorously  investigated  in  the  visual  system  (Shadlen  and  New- 
some,  1998).  Additionally,  we  found  (see  Figure  IOC)  that  while  neurons  with  higher-power 
STRFs  (higher  P )  tended  to  fire  more  spikes  (higher  r,  as  might  be  expected  from  a  linear- 
plus-rectification  response  model),  a  range  of  average  spike  rates  were  still  observed  for  any 
given  STRF  power.  Neurons  with  the  lowest  spike  rates  (for  a  given  P )  corresponded  to  the 
highest-SNR  STRFs,  and  had  the  sharpest,  most  phase-locked  responses  of  the  population 
(not  shown).  Neurons  with  the  highest  spike  rates  often  had  seemingly  random  responses 
and  poor-quality  STRF  measurements.  We  will  consider  the  origins  and  implications  of  such 
behavior  more  carefully  in  future  studies. 

4.7  Related  studies 

Other  recent  studies  have  also  addressed  the  similarity  of  STRF  measurements  with  different 
types  of  stimuli,  albeit  in  different  auditory  loci.  Escabi  and  Schreiner  (Escabi  and  Schreiner, 
2002)  measured  STRFs  in  cat  inferior  colliculus  (IC)  with  stochastic  stimuli  that  in  some 
respects  resemble  the  dynamic-ripple  stimuli  and  STWN  used  here.  While  their  results 
largely  agree  with  ours,  they  singled  out  a  small  group  of  neurons  that  either  exhibited 
extremely  selective  and  phase-locked  responses  to  the  dynamic-ripple-like  stimuli  but  were 
unresponsive  to  the  STNW-like  stimuli  (type-II  neurons),  or  exhibited  non-phase- locked 
nonlinear  responses  to  both  stimuli  (type-III  neurons).  As  discussed  above,  in  AI  we  also  find 
that  neurons’  responses  can  be  extremely  sparse  and  yet  yield  significant  STRFs  (like  their 
type-II  neurons).  However,  we  did  not  observe  two  distinct  populations  of  neurons;  rather, 
the  degree  of  phase  locking  in  response  to  all  stimuli  ranged  over  a  continuum.  In  addition, 
some  AI  neurons  exhibited  significant  spike  rates  but  poor  STRF  measurements  (like  their 
type-III  neurons).  Although  we  have  not  yet  found  a  nonlinear  relationship  between  these 
responses  and  the  dynamic  spectra  of  the  stimuli,  we  can  not  yet  rule  out  that  possibility. 
In  another  study,  Theunissen  et.  al.  (Theunissen  et  ah,  2000)  measured  STRFs  in  the  zebra 
finch  auditory  forebrain  in  response  to  random  tone  sequences  and  bird  songs,  and  used 
the  STRF  from  one  stimulus  to  predict  the  responses  to  the  other.  They  found  small  but 
significant  differences  in  the  cross-predictability  of  the  responses,  which  was  poor  overall. 
These  differences  either  reflect  differences  in  the  STRF-measurement  method  (which  was 
there  a  nonlinear  function  of  the  responses),  or  more  probably  reflect  a  higher  degree  of 
nonlinearity  in  the  responses  of  neurons  in  the  avian  auditory  forebrain  with  respect  to 
mammalian  AI  (but  see  (Schafer  et  ah,  1992)). 

4.8  Nonlinearity 

This  article  has  been  concerned  with  nonlinearities  only  insofar  as  they  interfere  with  the 
STRF  measurement,  and  methods  were  invoked  to  reduce  this  interference  (e.g.,  the  inverse- 
repeat  method).  Other  methods  are  also  available,  such  as  more  carefully  choosing  the 


39 


temporal  modulation  frequencies  in  the  TORCs,  so  that  the  nonlinear  distortion  products 
are  also  orthogonal  to  the  linear  response  (a  la  (Victor  and  Shapley,  1980)).  That  is  not  so 
say  that  nonlinearities  form  an  insignificant  part  of  the  AI  response,  merely  that  linearity 
is  important,  strong,  and  robust  to  changing  stimulus  conditions,  and  therefore  forms  an 
sturdy  foundation  upon  which  the  study  of  auditory  cortical  processing  can  be  based,  even 
in  its  nonlinear  aspects. 

We  are  currently  investigating  several  anticipated  nonlinearities.  These  include  the  non¬ 
linear  transformation  of  responses  occurs  at  the  thalamo-cortical  depressing  synapse,  which 
contributes  a  rapid  adaptation  of  onset  responses  towards  a  steady  state  within  a  few  tens  of 
milliseconds  (Denham,  2001;  Kowalski  et  al.,  1996a;  Phillips  et  ah,  2002;  Heil,  1997)  (we  con¬ 
sidered  only  the  steady-state  response  in  this  study).  Additionally,  we  have  observed  that 
when  stimuli  contain  both  low  and  high  modulation  frequencies,  AI  responses  can  phase 
lock  to  much  higher  frequencies  than  previously  expected  (e.g.,  100  —  200  Hz)  (Elhilali  et  al., 
2004).  Similar  effects  have  been  observed  in  the  visual  system  (Bair  and  Koch,  1996;  Reid 
et  al.,  1992;  Chance  et  al.,  1998).  In  our  stimuli,  these  high  modulation  frequencies  result 
from  interactions  between  unresolved  AM  tones  (that  fall  within  the  bandwidth  of  the  same 
cochlear  filter),  even  though  they  were  not  part  of  the  target  dynamic  spectrum  (and  there¬ 
fore  did  not  contribute  to  the  STRF  measurement).  A  third  nonlinearity  is  the  potential 
dependence  of  responses  on  the  bandwidth  of  the  stimulus.  Broadband  sustained  stimuli 
(such  as  the  ripples,  TORCs,  and  STWN)  likely  bias  cortical  cells  in  a  manner  different 
from  that  of  narrowband  or  transient  stimuli  such  as  tones  and  clicks.  Consequently,  pre¬ 
dicting  details  of  tone  and  click  responses  from  the  STRF  may  prove  sometimes  problematic 
(Kowalski  et  al.,  1996b;  Theunissen  et  al.,  2000).  However,  this  nonlinearity  is  irrelevant 
when  the  focus  is  on  comparing  STRFs  derived  from  similarly  broadband  and  sustained 
stimuli,  as  is  the  case  in  this  paper.  Yet  another  important  source  of  nonlinear  effects  are 
static  nonlinearities  (e.g.,  rectification,  response  saturation)  with  respect  to  stimulus  level 
and  contrast.  By  fixing  stimulus  contrast  at  near  maximum  (90%),  and  the  absolute  level 
at  an  intermediate  value  (e.g.,  based  on  the  rate- level  function  (Kowalski  et  al.,  1996b)  we 
have  managed  to  obtain  reliable  reproducible  results  from  a  sizable  proportion  of  cells  in  Al. 
Finally,  there  are  fundamental  nonlinearities  that  we  have  not  yet  convincingly  observed  in 
AI  responses,  such  as  units  analogous  to  the  complex  cells  of  the  visual  cortex  (De  Valois, 
R.L.  and  De  Valois,  K.K.,  1990).  Nevertheless,  it  is  likely  that  a  significant  proportion  of 
the  very  low  SNR  STRFs  observed  in  this  study  belong  to  cells  that  would  be  classified  as 
nonlinear  in  that  they  either  phase-lock  poorly  to  our  stimuli  or  respond  to  more  complex 
patterns  that  we  have  not  been  able  to  probe  (e.g.,  see  (Escabf  and  Schreiner,  2002)). 
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